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ABSTRACT 


This  thesis  focuses  on  the  design  and  analysis  of 
portable  efficient  algorithms  for  graph  theoretic  problems. 
The  aim  is  to  gain  a  deeper  insight  into  the  nature  of 
parallel  computation;  in  particular  concerning  the  time  and 
hardware  resource  tradeoffs  as  well  as  the  portability  of 
algorithms  among  computer  models.  The  class  of  problems 
investigated  are  the  following:  finding  the  lowest  common 
ancestors  for  vertex  pairs  of  a  directed  tree;  finding  all 
fundamental  cycles  of  an  undirected  graph,  determining  a 
directed  spanning  forest  of  an  undirected  graph;  solving  the 
two  colorability ,  bridge-connectivity,  bridge-connectivity 
augmentation  and  biconnectivity  problems  of  an  undirected 
graph.  For  the  PRAM(Parallel  RAM),  it  is  shown  that  all 
these  algorithms  achieve  the  0(lg2r?)  time  bound  ( lgn  denotes 
rlog2n-|  and  n  is  the  size  of  the  vertex  set),  with  the  first 
two  algorithms  using  nJ:n/lgn1  processors  and  the  remaining 
algorithms  using  rij-n/lg2 n-\  processors.  With  the  exception  of 
the  first  two  algorithms,  these  results  are  optimal  with 
respect  to  the  time-processor  product  for  dense  graphs.  It 
is  also  shown  that  for  any  probability  error  e,  where  0<e<1, 
these  algorithms  could  run  in  probabilistic  O(lgn)  time 
using  n3|E|lgn  processors,  where  E  is  the  edge  set  of  the 
undirected  graph.  The  performance  of  these  algorithms  when 
running  on  an  abstract  model  is  also  analyzed.  It  is  shown 
that  they  require  the  same  amount  of  hardware  resources  and 
at  most  a  factor  of  max ( lgcf , lgc/"  )  + 1  ,  1  ^d,d"^n,  more  time 
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than  the  ordinary  matrix  multiplication  algorithm  on  the 
abstract  model  ( d  and  d"  are  diameters).  This  result 
immediately  implies  that  all  these  algorithms  could  achieve 
the  0(n)  time  bound  on  the  MCN  (Mesh-connected  Networks), 
the  0( lg2r?)  time  bound  on  the  PSN  (Perfect  Shuffle 
Networks),  CCC  (Cube-connected  Cycles),  OTN  (Orthogonal  Tree 
Networks),  OTC  (Orthogonal  Tree  Cycles),  SIMD-CCC  (SIMD  • 
Cube-connected  Computers)  and  the  O(lqn)  time  bound  on  the 
WRAM  model  using  at  most  n4  processors.  The  expected  time 
complexity  of  these  algorithms  is  also  discussed.  It  is 
shown  that  with  the  exception  of  the  last  two  problems,  all 
the  algorithms  have  expected  time  0{ lgr?* lglgn)  on  the  PSN, 
CCC,  OTN,  OTC,  SIMD-CCC  and  the  PRAM  and  have  expected  time 
O(lglgn)  on  the  WRAM.  It  is  also  shown  that  for  the 
conventional  sequential  computer  model,  the  biconnectivity 
and  bridge-connectivity  algorithms  could  run  in  optimal  time 
and  space. 

A  general  program  scheme  for  finding  the  bridges  of  an 
undirected  graph  is  also  presented.  It  is  shown  that  by 
substituting  various  specific  functions  for  the  parameters 
in  the  program  scheme,  a  number  of  optimal  algorithms  for 
finding  the  bridges  can  be  derived.  Included  in  these  are 
the  known  optimal  sequential  algorithms  and  new  optimal 
parallel  algorithms  for  finding  the  bridges. 

The  possibility  of  breaching  the  0(lg2n)  time  bound  is 
also  examined.  It  is  shown  that  for  the  recognition  problems 
of  split  graphs  and  permutation  graphs,  Oilgn)  deterministic 
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Chapter  1 
INTRODUCTION 


1.1  Background 

The  advances  in  device  technology  over  the  past  decade 
have  contributed  an  enormous  increase  in  the  speed  of 
computation.  However,  as  the  speed  of  computer  devices  reach 
their  ultimate  physical  limitations,  system  performance  in 
the  future  can  only  be  significantly  improved  through 
parallelism.  This  has  stimulated  much  of  the  research 
activity  on  parallel  computation  during  the  past  decade. 
Since  the  basic  role  of  computers  is  to  carry  out 
computation,  the  design  of  efficient  algorithms  for  various 
classes  of  problems  is  always  desirable.  As  a  result, 
research  in  this  area  has  been  very  active.  In  this  thesis, 
our  concern  is  graph  theoretic  problems. 

Graph  theoretic  problems  arise  naturally  in  many 
contexts.  For  instance,  scheduling  in  operations  research, 
analyzing  networks  and  designing  potential  circuit  boards  in 
electrical  engineering,  designing  reliable  networks  for 
communication,  identifying  isomorphic  structures  in  chemical 
compounds  and  investigating  the  fine  structures  of  the  gene, 
etc.,  can  all  be  conveniently  formulated  in  terms  of  graphs. 
Due  to  the  widespread  applications  of  graphs,  the  design  of 
efficient  algorithms  for  graph  theoretic  problems  is  of  both 
theoretical  and  practical  interest.  For  the  conventional 
sequential  computer,  an  enormous  number  of  papers  devoted  to 
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efficient  graph  algorithms  have  been  published  over  the  last 
twenty  years.  By  contrast,  there  were  few  such  algorithms 
for  parallel  computer  models  until  the  mid-seventies  when 
several  0{lg2n)  time1  parallel  algorithms  for  the 
graph-connectivity  and  transitive  closure  problems  appeared. 
Since  then,  the  design  of  efficient  algorithms  for  graph 
theoretic  problems  on  parallel  computer  models  has  drawn  a 
great  deal  of  interest.  However,  despite  these  efforts, 
efficient  parallel  graph  algorithms  are  still  comparatively 
rare . 

Of  the  parallel  algorithms  published  in  the  literature, 

most  are  designed  for  the  SIMD  shared  memory  model,  allowing 

read  conflicts  but  not  write  conflicts.  Recently,  the  name 

PRAM  (Parallel  RAM)  was  attached  to  this  model [WYLL79 , 

B0R082]  and  has  been  widely  accepted.  Briefly  speaking,  the 

PRAM  has  an  unlimited  number  of  sequential  RAM's  all  of 

which  have  access  to  a  common  memory  of  unlimited  size  (we 

shall  call  these  sequential  RAM’s  "processors"  henceforth) .. 

Each  processor  is  assigned  a  unique  positive  integer  called 

the  processor  index.  At  any  time,  several  processors  may 

read  the  same  memory  location  at  the  same  time,  but  at  no 

time  may  more  than  one  of  them  write  into  the  same  memory 

location.  The  processors  are  synchronized  and  operated  under 

the  control  of  a  single  instruction  stream  propagated  by  a 

control  unit.  There  is  also  an  enable/disable  mask  which  can 

be  used  to  prevent  a  subset  of  the  processors  from  executing 

1  lgn  stands  for  rlog 2n-\  and  n  is  the  size  of  the  vertex  set 
of  the  undirected  graph. 
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an  instruction. 

As  opposed  to  the  conventional  sequential  computer,  one 
has  to  account  for  the  amount  of  hardware  resources  used 
when  one  designs  algorithms  for  a  parallel  computer  model. 
The  hardware  resources  are  measured  in  terms  of  the  number 
of  processors,  or  the  size  of  the  chip  area  if  VLSI 
technology  is  employed.  This  makes  the  situation  more 
complicated,  as  there  are  now  three  resources  —  time,  memory 
space,  hardware  resource  —  for  which  one  has  to  account. 
Partly  because  the  relat ionships  between  these  three 
resources  are  not  well  understood  yet,  and  partly  because  of 
memory  space  is  cheap  compared  to  time  and  hardware 
resources,  researchers  have  always  ignored  the  space 
resources  (unless  they  are  unreasonably  large)  and 
concentrated  on  minimizing  the  amount  of  time  and  hardware 
resources  (in  particular,  the  number  of  processors)  used  in 
designing  parallel  algorithms. 

From  the  description  of  PRAM,  it  is  not  difficult  to 
perceive  that  there  is  a  close  relationship  between 
sequential  RAM  and  PRAM.  Let  A  be  an  algorithm  designed  for 
a  problem  P  on  the  PRAM.  If  A  takes  T(n)  time  and  P in) 
processors  for  an  instance  of  P  of  size  n,  then  given  a 
sequential  RAM,  the  sequential  RAM  can  simulate  execution  of 
algorithm  A  on  the  PRAM  by  executing  every  instruction  of  A 
Pin)  times.  At  the  Jth  repetition,  the  sequential  RAM  will 
behave  exactly  like  the  ith  processor  of  the  PRAM  when  the 
PRAM  is  executing  A.  Clearly,  it  would  take  a  total  of 
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Tin) •Pin)  time  for  the  sequential  RAM  to  complete  the 
execution  of  algorithm  A.  An  implication  of  this  observation 
is  that  T(n) -Pin)>Lin) ,  where  Lin)  is  the  lower  bound  for 
problems  of  size  n  on  the  sequential  RAM. 

Let  GiV ,E)  be  a  graph  where  \V\=n.2  There  are  two  data 
structures  which  have  been  widely  used  to  represent  G  on  the 
sequential  RAM.  These  are  the  adjacency  list  and  adjacency 
mat r ix [TARJ72 , EVEN79 ] .  If  an  adjacency  list  is  used  to 
represent  G,  then  it  is  well-known  that  Li n ) =£2 in+ | £ 1 ) . 
However,  adjacency  lists  seem  to  be  inappropriate  for  SIMD  ' 
computers.  A  more  appropriate  data  structure  which  has  been 
widely  used  is  the  adjacency  matrix,  and  throughout  this 
thesis,  'we  shall  use  adjacency  matrix  to  represent  graphs  on 
parallel  computers.  For  graph  theoretic  problems  concerning 
non-trivial  monotone3  graph  properties,  it  has  been  proven 
that  if  the  input  graph  is  represented  by  an  nxn  adjacency 
matrix,  then  Lin)=ttin2 ) [KIRK74 ,RIVE76] .  Moreover,  it  is 
easily  shown  that  for  non-trivial  graph  theoretic4  problems, 
fl(lgn)  is  a  lower  bound  for  Tin)  on  the  PRAM [ S AVA7 7 ] .  As  a 
consequence,  P in)^^n2 /lgn-j  on  the  PRAM  for  achieving  the 
O(lgn)  time  bound  for  non-trivial  graph  theoretic  problems 
if  the  adjacency  matrix  is  used  as  input  data  structure.  In 
other  words,  in  designing  parallel  graph  algorithms  on  the 

2  All  graph-theoretic  terms  are  defined  in  Section  1.3 

3  A  graph  property  is  non-trivial  if  there  are  some  graphs 
possessing  the  property  and  some  which  do  not.  A  graph 
property  is  monotone  if  whenever  a  graph  GiV ,£)  possesses 
the  property,  then  any  graph  G'iV,E')  where  E  is  a  subset  of 
£'  also  possesses  the  property. 

4  A  graph  theoretic  problem  is  non-trivial  if  at  least  one 
of  its  output  is  a  function  of  all  its  inputs 
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PRAM,  the  O(lqn)  time  and  xn2 /lgn-|  processors  bound  is  the 
best  one  can  possibly  achieve  if  adjacency  matrices  are  used 
to  represent  graphs.  Up  to  the  present,  no  one  has  managed 
to  achieve  the  O(lgn)  time  bound  on  the  PRAM.  The  best  time 
bound  achieved  so  far  is  0ilq2n),  and  there  is  strong 
evidence  that  0(lq2n)  may  be  a  lower  bound  for  time  on  the 
PRAM,  although  no  proof  has  been  given.  Consequently,  the 
more  promising  optimal  bounds  one  could  achieve  on  the  PRAM 
are  the  0(lg2r?)  time  and  rn2  /lg2  n-\  processors  bounds. 

Many  of  the  graph  theoretic  problems  do  have  parallel 
algorithms  achieving  the  0(lg2n)  time  bound.  However,  the 
number  of  processors  used  to  achieve  this  time  bound  is 
always  greater  than  0in2/lq2n) .  The  only  exception  is  the 
graph-connectivity  problem  and  some  of  its  equivalent 
problems.  The  first  parallel  algorithm  for  this  problem 
running  on  the  PRAM  achieved  the  0(lg2r?)  time  bound  with  n3 
processors [ARJ075 ,REGB78 ] .  The  processor  bound  was  then 
improved  to  n2  independently  by  Hirschberg[HIRS76 J  and 
Savage [ SAVA77 ]  and  to  0(\E\+nlqn)  by  Ja ’ Ja ' [ JAJA78 ]  (E  is 
the  edge  set  of  the  given  graph)*  Hirschberg,  Chandra  and 
Sarwate [HIRS79 ]  further  improved  the  processor  bound  to 
0(n2/lqn)  and  Wyllie  improved  it  to  n+\E\ [WYLL79 ] .  Finally, 
Chin,  Lam  and  Chen [CHIN8 1 j CHIN82 ]  managed  to  improve  the 
bound  to  0(n2 /lq2 n)  (note  that  Ja'Ja’  and  Wyllie's 
algorithms  have  Tin)  *P(n)^0(n2 ) This  does  not  give  rise  to 
a  contradiction  to  our  previous  discussion,  because  they  did 
not  use  adjacency  matrices  to  represent  graphs.  Their 
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results  are  not  optimal  for  either  sparse  graphs  or  dense 
graphs).  Parallel  algorithms  for  other  graph  theoretic 
problems  which  achieve  the  0(lg2n)  time  bound  but  with  a 
greater  number  of  processors  in  the  PRAM  can  be  found  in 
[ ARJ07 5 , ATAL82 , CHAN76 , GOLD77 , JA JA7  8 , JA JA8  2 , REGH7 8 , SAVA77 , 
SAVA81].  Others  which  run  in  fi(n)  time  can  be  found  in 
[ ARJQ7  5 , ECKS77a , ECKS77b , REGB7  8 , SHI L8 1 ,VISH8 la] . 

The  PRAM  has  received  the  most  attention  in  the  past 
decade  but  has  also  received  criticism  for  its 
impracticability  for  construction  by  current  technology.  In 
view  of  this,  some  researchers  began  to  design  graph 
algorithms  for  other  more  restrictive  models  which  can  be 
constructed  with  current  technology.  Apparently,  designing 
graph  algorithms  on  these  models  is  much  more  difficult  than 
on  the  PRAM.  Up  to  the  present  time,  only  a  few  algorithms 
for  some  basic  graph  theoretic  problems  (mainly  for  the 
graph-connectivity  problem)  have  been  reported  for  a  few  of 
these  models [NASS8 1 , NATH8 1 , NATH8 2 , ATAL8 2 , AWER8 3 ] . 

.1.2  Thesis  Outline  and  Main  Results 

In  this  thesis,  we  focus  on  the  design  and  analysis  of 
efficient  algorithms  for  a  class  of  graph  theoretic  problems 
on  various  computer  models.  This  class  of  problems  includes 
the  following:  finding  the  lowest  common  ancestors  for 
vertex  pairs  of  a  directed  tree;  finding  all  fundamental 
cycles  of  an  undirected  graph,  determining  a  directed 
spanning  forest  of  an  undirected  graph,  solving  the 
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two-colorabi 1 i ty ,  bridges  connectivity,  bridge-connectivity 
augmentation  and  biconnect ivity  problems  of  an  undirected 
graph,  and  recognizing  splits  graphs  and  permutation  graphs. 
This  class  of  problems  has  drawn  a  great  deal  of  interest 
recently  and  efficient  algorithms  for  sieving  them  on 
various  computer  models  have  been  developed! ATAL82 , SAVA8 1 , 
REIF82a,REIF82b] . 

Traditionally,  whenever  an  algorithm  is  presented,  it 
is  designed  with  a  particular  model  in  mind,  and  its 
complexity  analysis  is  provided  for  that  model  only.  There 
are  at  least  two  drawbacks  with  this  approach.  Firstly,  it 
is  difficult  to  compare' two  different  algorithms  for  the 
same  problem  if  they  are  designed  for  different  computer 
models.  Secondly,  extra  effort  has  to  be  made  in  order  to 
carry  it  over  to  other  models.  A  typical  example  is 
Hi rschberg ' s  graph-connectivity  algorithm  which  was 
originally  designed  for  the  PRAM.  It  was  then  implemented  on 
the  MCN  (Mesh-connected  Networks)  by  Nassimi  and 
Sahni [NASS8 1 ] j  on  the  PSN  (Perfect  Shuffle  Networks)  by 
Schwartz [ SCHW80 ] ,  on  the  WRAM  by  Shiloach  and 
Vishkin [ SHIL82a , VI SH82 ] j  and  finally  on  the  PSN  (Perfect 
Shuffle  Networks),  QTN  (Orthogonal  Tree  Networks),  and  QTC 
(Orthogonal  Tree  Cycles)  by  Nath,  Maheshwari  and 
Bhatt [NATH8 1 ,NATH82 ] .  It  would  be  convenient  if  the 
complexity  analysis  of  an  algorithm  could  be  given  in  such  a 
way  that  it  would  be  valid  for  any  model  satisfying  certain 
moderate  conditions. 
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In  this  thesis,  we  shall  design  efficient  algorithms 
which  are  portable  in  the  sense  that  they  can  run 
efficiently  on  many  computer  models.  In  particular,  they  run 
on  an  abstract  model,  called  MMM ,  which  includes  a  large 
class  of  parallel  computer  models  as  special  cases. 

In  the  next  section,  definitions  are  provided  for  terms 
and  notations  to  be  used  in  subsequent  chapters. 

In  Chapter  2,  efficient  algorithms  are  presented  for 
the  class  of  graph-theoret ic  problems  listed  above  except 
the  last  two  problems  on  the  PRAM.  All  these  algorithms 
achieve  the  0(lg2n)  time  bound,  with  the  first  two 
algorithms  using  nrn/lg/7-j  processors  and  the  remaining 
algorithms  using  n  rn/lg2  n-\  processors.  In  all  cases,  our 
algorithms  are  better  than  the  best  previously  known 
algorithms  and  in  most  cases  reduce  the  number  of  processors 
used  by  a  factor  of  nlgn .  Moreover,  our  algorithms  are 
optimal  with  respect  to  the  time-processor  product  for  dense 
graphs  with  the  exception  of  the  first  two  algorithms. 

In  Chapter  3,  it  is  shown  how  the  algorithms  presented 
in  Chapter  2  could  be  implemented  efficiently  on  other  more 
restrictive  SIMD  models.  This  is  accomplished  by  first 
proposing  an  abstract  model,  called  MMM,  which  satisfies 
certain  moderate  constraints  and  then  implementing  the 
algorithms  on  the  MMM.  It  is  shown  that  most  of  these 
algorithms  achieve  the  0(lg2n)  time  bound  with  j-n3/lgr?-] 
processors  on  the  PRAM  and  many  restrictive  SIMD  models;  the 
O(lgn)  time  bound  with  n3  processors  on  the  WRAM  (a  stronger 
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PRAM),  and  the  0(n)  time  bound  with  n2  processors  on  the 
Mesh-connected  Networks. 

In  Chapter  4,  the  implementation  of  these  algorithms  on 
the  conventional  sequential  model  is  explored.  It  is  shown 
that  the  biconnectivity  algorithm  can  be  implemented  on  the 
sequential  computer  in  optimal  time  and  space.  Moreover,  the 
algorithm  is  shown  to  be  a  generalization  of  the  best 
previously  known  sequential  algorithm  for  the  same  problem. 
The  bridge-connectivity  algorithm  is  also  generalized  to  a 
general  program  scheme  for  finding  the  bridges  in  an 
undirected  graph.  This  general  program  scheme  includes  most 
of  the  best  previously  known  sequential  algorithms  as 
special  cases.  In  addition  to  that,  new  parallel  algorithms, 
including  the  one  presented  in  Chapter  2,  can  be  deduced 
from  it. 

In  Chapter  5,  the  possibility  of  breaching  the  0(lg2n) 
time  bound  is  examined.  Based  on  some  of  the  recent  results 
due  to  Reif [REIF82a ,REIF82b] ,  it  is  shown  that  given  any 
probability  error  e,  0<e<1,  our  algorithms  could  run  in 
O(lgn)  time  using  |E|n3lgn  processors  on  the  PRAM  with 
probability  less  than  e  that  an  error  will  occur.  It  is  also 
shown  that  the  expected  time  complexity  for  most  of  the 
algorithms  described  in  Chapter  3  is  0{ Ign* lglgn)  on  the 
PSN ,  CCC,  OTN ,  QTC ,  SIMD-CCC  and  PRAM  and  is  0( lglgn)  on  the 
WRAM.  The  recognition  problems  for  split  graphs  and 
permutation  graphs  are  also  studied  and  O(lgn)  deterministic 
time  algorithms  are  presented. 
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Finally,  in  Chapter  6,  our  results  are  summarized  and 
some  open  problems  for  further  research  are  listed. 

1.3  Definitions  and  Notations 

A  graph  G(V  ,E)  consists  of  a  finite  non-empty  set  1/  of 
vertices  and  a  set  £  of  pairs  of  vertices  called  edges. 
Without  loss  of  generality,  we  assume  V= { 1 , 2 , . . . , n] 
throughout  this  thesis.  If  the  edges  are  unordered  pairs, 
then  G  is  undirected;  otherwise  G  is  directed.  G(V ,E)  is 
sparse  if  \E\=0(n)  and  is  dense  if  |£|=0(n2).  For  undirected 
graphs,  an  edge  joining  the  vertices  a  and  b  is  represented 
by  ( a,b ).  Furthermore,  ( a,b )  and  (b,a )  are  considered  as 
identical  elements.  For  directed  graphs,  an  edge  from  vertex 
a  to  vertex  b  is  represented  by  <a,b>.  a  is  called  the  tail 
of  the  edge  while  b  is  called  the  head  of  the  edge.  The 
underlying  graph  of  a  directed  graph  G' (V ,£' )  is  an 
undirected  graph  G(V,E)  such  that  ( u,v)eE  iff  <u ,v>  or 
<v,u>eE' .  A  graph  G' (V ' ,£’ )  is  a  subgraph  of  a  graph  G(V ,E) 
if  V'  is  a  subset  of  V  and  £'  is  a  subset  of  £.  Let  V'  be  a 
subset  of  V .  The  graph  G' (V'  rV' xV' AE)  is  called  a  subgraph 
of  G(VrE)  induced  by  V '  (A  stands  for  set  intersection 
here).  An  adjacency  matrix  M  of  an  undirected  (resp. 
directed)  graph  G(V ,£)  is  a  nxn  Boolean  matrix  such  that 
M[u,v]= 1  iff  ( u,v)eE  (resp.  <u,v>eE). 

Let  P=  {i/0  ,LVi  ,  .  .  .  ,Uk  }  be  a  sequence  of  vertices  of  an 
undirected  graph  G(V,E),  P  is  called  a  walk  in  G  if 
(l/i  jUi  + 1 )  eEj  0  <i<k.  We  say  that  (L/j,Ui  +  i)  is  an  edge  on  P_. 
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The  length  of  P  is  k.  A  path  is  a  walk  in  which  Uj^L/j  for 
itj.  A  cycle  is  a  walk  in  which  u^=uk  and  no  edge  in  G 
appears  more  than  once.  A  simple  cycle  is  a  path  in  which 
U i =U k •  Directed  walks,  directed  paths,  directed  cycles  and 
directed  simple  cycles  are  defined  in  the  similar  way. 

Let  G(V,E)  be  an  undirected  graph;  if  for  every  two 
vertices  u ,v  in  V ,  there  is  a  path  in  G  joining  u  and  v, 
then  G  is  connected.  Each  connected  maximal  subgraph  of  G  is 
called  a  connected  component  of  G.  The  diameter  of  G  is  the 
length  of  the  ’longest  minimal  path  between  all  vertex  pairs 
if  G  is  connected  and  is  the  longest  diameter  of  all  the 
connected  components  of  G  if  G  is  disconnected.  Diameters 
for  directed  graphs  can  be  defined  in  a  similar  way.  Let  v 
be  a  vertex  in  G.  The  degree  of  v  is  the  number  of  edges  in 
G  incident  on  \/.  If  the  degree  of  \/  is  0,  then  v  is  called 
an  isolated  vertex;  if  the  degree  of  \/  is  1,  then  v  is 
called  a  pendant.  If  all  vertices  in  G  have  the  same  degree, 
then  G  is  a  regular  graph. 

A  tree  is  a  connected  undirected  graph  with  no  cycles 
in  it.  Let  T{V'  ,E')  be  a  directed  graph.  T  is  said  to  have  a 
root  r,  if  reV'  and  every  vertex  veV '  is  reachable  from  r 
via  a  directed  path.  If  the  underlying  undirected  graph  of  T 
is  a  tree,  then  T  is  a  directed  tree.  If,  moreover,  the 
underlying  graph  of  T  is  a  subgraph  of  a  connected 
undirected  graph  G(V,E)  such  that  V'=V,  then  T  is  a  directed 
spanning  tree  in  G.  A  directed  forest  is  a  graph  whose 
connected  components  are  directed  trees.  If  T  is  a  directed 
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forest  such  that  each  directed  tree  in  7  is  a  directed 
spanning  tree  of  a  connected  component  of  an  undirected 
graph  G  and  vice  versa,  then  7  is  called  a  directed  spanning 
forest  of  G.  If  the  edges  of  7  are  all  reversed,  the 
resulting  graph  is  called  an  inverted  spanning  forest  of  G. 
Inverted  spanning  trees,  inverted  trees,  inverted  forests 
etc.  are  defined  similarly.  Let  <arb>  (resp.  <b,a>)  be  an 
edge  in  a  directed  (resp.  inverted)  tree,  a  is  the  father  of 
b  and  b  is  a  son  of  a.  Let  c,  d  be  any  two  vertices  in  a 
directed  (resp.  inverted)  tree  7,  C  is  an  ancestor  of  d  if 
C=d  or  there  exists  a  directed  path  from  c  to  d  (resp.  from 
d  to  c)  in  7.  C  is  a  proper  ancestor  of  d  if  c  is  an 
ancestor  of  d  and  ctd .  d  is  a  descendant  of  c  if  C  is  an 
ancestor  of  d. 

Throughout  this  thesis,  we  denote  the  'undirected'  path 
from  vertex  a  to  vertex  b  in  a  (directed)  tree  by  [a*-*b] , 
and  by  [a*-*b)  if  vertex  b  is  to  be  excluded.  If  the  path 
consists  of  at  least  one  edge,  then  the  ' * '  is  removed  from 
the  notation.  Moreover,  we  denote  uSv  iff  u  is  an  ancestor 
of  v  in  the  tree  and  u*V  iff  u  is  a  proper  ancestor  of  v. 

Let  T(V,E')  be  an  inverted  (directed)  spanning  forest  of  an 
undirected  graph  G{V,E ).  The  graph  G~T  is  an  undirected 
graph  whose  vertex  set  is  V  and  whose  edge  set  is 
E~{ (u,v) | <v,u>eE' } .  Any  edge  in  G~T  is  called  a  non-tree 
edge.  To  simplify  our  notation,  we  shall  use  E~E'  to  denote 
the  edge  set  of  G~7.  Let  G‘[{V\,E-i)  and  G2(V2,E 2)  be  two 
graphs.  G\UG2  is  a  graph  whose  vertex  set  is  V yUV 2  and  the 
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edge  set  is  E ^UE 2.  GiAG2  is  a  graph  whose  vertex  set  is 
VyW/2  and  the  edge  set  is  EiAE2  (A  stands  for  set 
intersection  here).  Gi6/G2  is  a  graph  whose  vertex  set  is 
V  yUV 2  and  the  edge  set  consists  of  edges  that  are  either  in 
E i  or  E2,  but  not  in  both. 

An  inverted  (directed)  tree  T  is  called  an  ordered  tree 
if  the  sons  of  every  vertex  in  T  are  ordered.  If  v  is  the 
7th  son  of  a  vertex  in  T,  then  the  rank  of  v  is  /. 

The  preorder  and  postorder  traversals  of  an  inverted 
(directed)  tree  are  defined  as  follows: 

Preorder  traversal 

(i)  Visit  the  root  of  the  tree. 

(ii)  Traverse  each  subtree  of  the  root  in  preorder,  in 

order  of  rank. 

Postorder  traversal 

(i)  Traverse  each  subtree  of  the  root  in  postorder,  in 

order  of  rank. 

(ii)  Visit  the  root  of  the  tree. 

Note  that  there  is  no  inorder  traversal  for  trees  as  there 
is  no  obvious  place  to  insert  a  root  among  its  descendants. 
If  in  the  course  of  traversing  an  ordered  tree  in  preorder, 
vertex  v  is  the  kth  vertex  visited,  then  the  preorder  number 
of  v  is  defined  to  be  k,  Postorder  number  can  be  defined 
similarly . 

Let  T(V' ,£')  be  a  directed  tree,  and  urveV'.  The  lowest 
common  ancestor  LCh(u,v)  of  u  and  v  in  T  is  the  vertex  weV' 
such  that  w  is  a  common  ancestor  of  u  and  v  and  any  other 
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common  ancestor  of  u  and  \/  in  T  is  also  an  ancestor  of  w  in 

T.  If  7  is  a  spanning  tree  of  a  connected,  undirected  graph 

G,  let  iu,v )  be  an  edge  in  G~T ,  then  the  cycle  in  G 
consisting  of  the  paths  [u^LCk  (u ,  v)  ] ,  [LCA(u,\/)-*->\/]  and  the 
edge  (v,u)  is  a  fundamental  cycle  in  G.  An  undirected  graph 
G(V,E)  is  2-colorable  (bipartite)  if  V  can  be  partitioned 
into  V ,  and  V 2  such  that  no  edge  in  G  has  both  of  its 
end-vertices  in  V or  V2.  For  eeE ,  e  is  a  bridge  in  G  iff  e 

is  not  on  any  cycle  in  G.  Let  B  be  the  set  of  bridges  in  G; 

then  every  connected  component  of  the  graph  G' {V  rE~B)  is  a 
bridge-connected  component  of  G.  The  bridge-connectivity 
augmentation  problem  is  the  problem  of  adding  the  minimum 
number  of  edges  to  a  graph  so  as  to  bridge-connect  the 
graph.  For  aeV ,  if  there  exist  u,veV  such  that  u,  v,  a  are 
all  distinct  and  that  every  path  connecting  u  and  v  in  G 
passes  through  a,  then  a  is  called  a  separation  vertex  of  G. 
A  graph  is  biconnected  if  it  contains  no  separation  vertex. 
Every  maximal  biconnected  subgraph  of  G  is  called  a 
biconnected  component  of  G. 

Let  G(V,E)  be  an  undirected  graph.  G  is  independant  if 
£=0  and  G  is  complete  if  E=VxV .  An  undirected  graph  G(V ,E) 
is  a  split  graph  iff  \/  can  be  partitioned  into  two  disjoint 
subsets  V/ t  , V/ 2  such  that  the  graph  G^{V^,E^)  induced  by  V,  is 
independent  and  the  graph  G2(V2,E 2)  induced  by  V 2  is 
complete.  We  shall  call  {G: (V ^ ,E ^) ,G2 (V 2 ,E 2) }  a  split  of  G. 

A  clique  in  G  is  a  maximal  complete  subgraph  of  G. 
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Let  P=[P( 1 ) ,P(2) ,P(3) , . . . ,P(n) ]  be  a  permutation  of  U. 
Let  E(P)={ (  i,j)  |P" 1 (  /  )<P- 1 (j)  and  /> j  or  p- 1 ( / ) >p- 1 ( j )  and 
i<j ,  i rjeV] ,  where  P " 1 (  / )  is  the  element  in  V  which  P  maps 
into  /.  The  permutation  graph  of  P  is  the  undirected  graph 
GiV,E(P)).  A  directed  graph  GT (V rE' )  is  transitive  if  ( irj ) 
and  (j,k)eE’  =>  (irk)eE'. 


. 

■* 

Chapter  2 

EFFICIENT  ALGORITHMS  FOR  THE  PRAM 


2.1  Introduction  and  Previous  Results 

In  this  chapter,  we  shall  present  efficient  algorithms 
for  the  class  of  graph  theoretic  problems  listed  in  the 
Introduction  except  the  recognition  problems  of  split  graphs 
and  permutation  graphs  which  will  be  deal  with  in  Chapter  5. 
The  computer  model  we  use  is  the  widely  accepted 
PRAM[WYLL79 ] .  In  subsequent  chapters,  we  will  consider  the 
implementation  of  these  algorithms  on  other  computer  models. 

The  class  of  problems  we  investigate  in  this  chapter 
has  been  studied  by  various  people  before.  The  best  known 
results  for  the  PRAM  were  due  to  Savage  and  Ja ' Ja ' [ SAVA8 1 ] . 
They  designed  parallel  algorithms  for  these  problems  and 
achieved  an  0(lg2r?)  time  bound  with  the  processor-t ime 
products  being  0(n2 lg2  n)  for  the  directed  spanning  tree 
problem  and  being  0(n 3)  or  0(n2  (lgn) m)  where  m>3  for  the 
remaining  problems.  In  this  chapter,  the  algorithm  we 
present  for  the  lowest  common  ancestors  problem  takes 
0(j<l/nK-[  .  lgn+n/K)  time  with  nK(K>0)  processors,  where  q  is 
the  number  of  vertex  pairs  whose  lowest  common  ancestors  are 
to  be  found.  The  algorithm  for  the  fundamental  cycles 
problem  takes  0(j;\E  | /nK-|  .  lgn+n/K+lg2n)  time  with  r?K(K>0) 
processors,  where  E  is  the  edge  set  of  the  undirected  graph. 
The  algorithms  for  the  directed  spanning  forest,  the 
2-colorabili ty ,  the  bridge-connectivity,  the 
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bridge-connectivity  augmentation  and  the  biconnectivity 
problems  all  take  Oin/K+lq2 n)  time  with  r?K(K>0)  processors. 
In  particular,  an  0(lq2n)  time  bound  can  be  achieved  with 
K=x/7/lgnn  for  the  first  two  problems  and  with  K=j-n/lg2r?-|  for 
the  remaining  problems.  Since  the  processor-time  products  of 
our  algorithms  are  at  most  0(n2 lgn) , f or  0<K<j-n/lg2r?-|  ,  our 
algorithms  are  better  than  Savage  and  Ja’Ja's’  in  all  cases 
and  in  most  cases  use  a  factor  of  nlgn  fewer  processors. 
Except  for  the  algorithms  for  the  first  two  problems,  the 
processor-time  products  of  our  algorithms  are  Oin2),  which 
is  optimal  for  dense  graphs. 

Besides  being  more  efficient,  our  algorithms  also 
assume  bounded  parallelism  as  opposed  to  the  unbounded 
parallelism  adopted  by  Savage,  Ja ' Ja '  and  many  others. 
Bounded  parallelism  is  more  realistic  as  it  can  cope  with 
the  situation  where  the  number  of  processors  available  is 
smaller  than  the  input  size. 

Throughout  this  chapter,  we  assume  that  the  input  to 
each  algorithms  is  an  adjacency  matrix,  and  the  arithmetic 
operations,  +,  -  as  well  as  the  boolean  operations  each 
takes  one  time  unit  to  execute. 

2.2  Preliminary  Results 

2.2.1  Two  Useful  Lemmas 

In  this  section,  we  list  two  lemmas  which  will  be  used 
frequently  in  analyzing  the  time  and  processor  complexities 
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in  this  chapter. 

Lemma  2.1:  Given  n  elements  {a0,  a1f  ...  ,  a,,.,},  let  f  be  a 
function  to  be  applied  to  every  element.  If  computing  f( a,) 
takes  t  time  units  and  K(>1)  processors  are  provided,  then 
f(  a  ■,  )  ,  0<  i<n~  1  ,  can  be  computed  in  j-n/K-|*t  parallel  time 
units. 

Lemma  2 . 2 : [CHIN8 1 , CHIN82 ]  Given  n  elements  {a0,  a1#  ...  , 
an.t)  and  K  processors,  A  (n)  =a0  *a  ,  *a  2  *  .  .  .  *a  n .  ^  can  be 
computed  in  T  parallel  time  units  where  *  is  any  associative 
binary  operator  and 

T  =  j-n/K-,-1  +  lgK  if  Ln/2J>K 
=  lg n  if  Ln/ 2J<K 

2.2.2  Finding  All  Paths  from  the  Vertices  to  the  Roots  in  an 
Inverted  Forest 

In  this  section,  we  present  a  method  for  constructing 
an  array,  denoted  by  F+,  in  which  each  row  contains  a  path 
from  a  vertex  to  a  root  in  an  inverted  forest.  The  array 
will  be  very  useful  in  the  design  of  parallel  algorithms 
presented  in  the  following  sections. 

Let  T(V' ,£' )  be  an  inverted  forest  with  \V'\=n.  Without 
loss  of  generality,  we  assume  V/T={  1  ,2, .  .  .  ,n} .  Let  {T j }  be 
the  set  of  all  inverted  trees  in  T  and  { r1  j }  be  the  set  of 
all  their  roots. 
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Definition  :  F  :  V  — ►  V'  is  a  function  such  that 

F (/)  =  the  father  of  the  vertex  7  in  7  for  i/{r j}j 
F (r)  =  r,  Vre{r } } . 

The  function  F  can  be  represented  by  a  directed  graph  F 
which  can  be  constructed  from  7  by  adding  a  self-loop  at 
each  root  r}  in  7. 

From  the  function  F,  we  define  Fk,k>0,  as  follows: 
Definition  :  Fk:  V'  — >  V'  ,  k>0,  is  a  function  such  that 
F° (  /)=/,  VieV'; 

F  k  (  7  )  =F  ( F  k  ’  1  (  7  )  )  ,  VieV'  , k>0. 

If  /  is  a  vertex  in  7 j ,  Fk(/)  is  the  kth  ancestor  of  /  in  7 j 
or  r  j  . 

Definition  :  For  each  i  eV'  ,  if  7  is  in  T }J  for  some  j ,  then 
dept  hi  i )  =min{k  |Fk  (  / )  =r*  j  and  0<k<r?~  1 } . 

The  concepts  Fk(/),  k>0,  and  depthi i ) , ,  were  first 
introduced  by  Savage  in  [SAVA77],  It  was  shown  that  given 
the  function  F  of  a  directed  forest  7  (7  could  be  a  directed 
forest  or  its  inverted  forest),  F k ( 7 )  ,  0<k<n- 1 ,  and 
depth i i) ,  1 <  / < n ,  can  be  computed  in  Oilgn)  time  with  n2 
processors  and  n^n/lgn-]  processors  respectively.  In  the 
following,  we  will  show  in  Theorem  2.3  that  Fk(/),  0<k<n~ 1 , 

1  <  /<n ,  can  indeed  be  computed  in  O(lgn)  time  with  rij-n/lgn^ 
processors  or  in  0(lg2r?)  time  with  rijn/lg2 n-\  processors  and 
then  depthii)  in  O(lgn)  additional  time  with  n  processors. 

Theorem  2.3: (i)  Given  the  function  F  of  a  directed  or  an 
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inverted  forest  T,  F k ( / ) , 1 < i<n , 0^k<n- 1  can  be  computed  in 
Oin/K+lgn)  time  with  r?K(K>0)  processors  on  a  PRAM,  (ii) 

Given  Fk(j),  1  <i<n,  0<k</?-1,  and  r?K(K>0)  processors, 
depthi i ) , 1 < 7<n  can  be  computed  in  0(lg(n/K))  time  if  K>1  or 
in  0(  f  1/K-]  •  lgn)  time  if  0<K<1  on  a  PRAM. 

Proof:  To  compute  Fk ,  for  all  0<k<n~1,  we  proceed  in  two 
steps : 

1.  for  /:1</<n  pardo  F° (/):  =  /;  F 1 (  / ) : =F (  / )  dopar; 

2.  for  t:= 0  to  1 g ( n— 1 )  —  1  do 

for  s:1^S^2t,  / : 1< ;<n  pardo 

F  2  *  *  t  +  S  (  J*  )  :=  F2  ««  t  (F«  (  /  )  ) 

dopar ; 

If  r?K  processors  are  given,  it  is  clear  that  step  1  can  be 
computed  in  0(j-1/K~i  )  time  (Lemma  2.1).  Step  2  can  be  computed 
in5  UH"*15"1 

=  lgK+  k15"1  (j-2  VK-, ) 

<  lgK  +  lg (n- 1 ) -lgK  +  1/K  2' 

=0(n/K+lgn )  time  units. 

Once  Fk (  i ) , 1<  i<n, 0<k<n- 1 ,  are  computed,  depthi  i ), 1^ i^n ,  can 
be  found  by  performing  a  binary  search  on  the  ordered 

sequence  F°(/)  ,F1(/), . ,F n “ 1 (  / )  ,  for  each  /  ,  searching 

for  the  left-most  occurrence  of  Tj  using  F n ' 1 (  / ) ( =r j )  as  the 
key.  This  takes  a  total  of  0(  xl  /K-|*lgr?)  time  units  if  0<K<1. 
For  K>1,  the  search  is  performed  in  the  following  way: 
divide  the  sequence  into  j-n/K -j  segments,  assign  one 
processor  to  each  segment  and  perform  s imultaneously  a 
binary  search  to  search  for  the  left-most  occurrence  of  r } 
in  each  segment.  After  this  step,  every  processor  compares 

5  Due  to  the  limitation  of  our  character  set,  we  must  use  lg 
to  represent  lg  in  superscripts  and  subscripts. 
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the  element  it  finds  with  the  preceding  and  succeeding 
elements  in  the  sequence.  There  is  exactly  one  processor 
which  does  not  have  all  the  three  elements  distinct  or 
identical  and  this  processor  locates  the  left-most 
occurrence  of  r j  in  the  sequence.  This  takes  a  total  of 
0(lgj-n/K-|)  time  units.  ■ 

The  actual  computations  of  F k (  / ) , 1 <  i<n , 0<k<n~ 1 ,  and 
depth ( / ) , 1 < / <n ,  are  performed  in  an  array  F+  in  which 
F +[i,k]  contains  Fk(/).  After  the  computations  are  finished, 
each  row  of  F+  is  right  shifted  so  that  all  the  fj's  except 
the  left-most  one  are  eliminated.  As  a  consequence,  the 
right-most  column  of  the  array  contains  only  the  roots  from 
{ r* j }  .  Furthermore,  for  each  vertex  /,  all  occurrences  of  / 
appear  only  in  column  (n- 1 ) -depthi / ) .  For  each  row  /,  a 
number,  r?+  / ,  acting  as  an  undefined  value,  is  inserted  into 
the  first  (r?- 1 )  -depthi  i )  entries.  These  adjustments  are  done 
for  convenience  and  not  out  of  necessity  and  they  take 
0(n/ K)  time  with  r?K(K>0)  processors ( Lemma  2.1).  The  adjusted 
array,  F+,  of  an  inverted  tree  is  depicted  in  Figure  2.1,. 
Note  that  the  Jth  row  in  F+  contains  the  path  from  vertex  / 
to  a  root  in  T. 

2.3  Constructing  a  Directed  Spanning  Forest  in  an  Undirected 
Graph 

In  this  section,  we  present  an  efficient  parallel 
algorithm  for  constructing  a  directed  spanning  forest  in  an 
undirected  graph  G(V,E).  In  view  of  the  fact  that  it  is  the 


■  - 


. 


22 


12 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

1 

16 

16 

16 

16 

16 

16 

16 

16 

16 

16 

16 

1 

11 

13 

12 

2 

17 

17 

17 

17 

17 

17 

17 

17 

17 

17 

17 

2 

3 

14 

12 

3 

18 

18 

18 

18 

18 

18 

18 

18 

18 

18 

18 

18 

3 

14 

12 

4 

19 

19 

19 

19 

19 

19 

19 

19 

19 

19 

19 

4 

3 

14 

12 

5 

20 

20 

20 

20 

20 

20 

20 

20 

20 

20 

20 

5 

7 

15 

12 

6 

21 

21 

21 

21 

21 

21 

21 

21 

21 

21 

21 

6 

7 

15 

12 

7 

22 

22 

22 

22 

22 

22 

22 

22 

22 

22 

22 

22 

7 

15 

12 

8 

23 

23 

23 

23 

23 

23 

23 

23 

23 

23 

23 

23 

8 

13 

12 

9 

24 

24 

24 

24 

24 

24 

24 

24 

24 

24 

24 

9 

11 

13 

12 

10 

25 

25 

25 

25 

25 

25 

25 

25 

25 

25 

25 

10 

11 

13 

12 

11 

26 

26 

26 

26 

26 

26 

26 

26 

26 

26 

26 

26 

11 

13 

12 

12 

27 

27 

27 

27 

27 

27 

27 

27 

27 

27 

27 

27 

27 

27 

12 

13 

28 

28 

28 

28 

28 

28 

28 

28 

23 

28 

28 

28 

28 

13 

12 

14 

29 

29 

29 

29 

29 

29 

29 

29 

29 

29 

29 

29 

29 

14 

12 

15 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

30 

15 

12 

Figure  2.1  A  directed  tree  and  its  array  F+. 

Note  that  since  n=15,  any  number  greater  than  15  serves  as  an 

undefined  value  in  the  array. 
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inverted  spanning  forest  of  G  which  is  useful  in  the  design 
of  other  parallel  algorithms  in  the  following  sections,  the 
algorithm  presented  below  actually  constructs  an  inverted 
spanning  forest.  Nevertheless,  converting  an  inverted 
spanning  forest  into  a  directed  spanning  forest  is 
straightforward.  This  algorithm  will  serve  as  the  backbone 
of  the  other  algorithms  presented  in  the  following  sections. 
It  takes  0(n/K+lq2n)  time  if  r?K(K>1)  processors  are 
available  and  could  achieve  the  0(lg2r?)  time  bound  using  the 
optimal  number  of  processors.  The  previous  best  result  takes 
n2  processors  to  achieve  the  0(lg2n)  time  bound [ SAVA77 ] . 

This  algorithm  is  based  on  the  algorithm  for  finding  an 
undirected  spanning  forest  presented  in  [CHIN82]  and  the 
array  F+  presented  in  the  last  section.  The  latter  is  used 
to  assign  a  direction  to  each  edge  in  the  undirected 
spanning  forest  generated  by  the  former.6 

We  first  give  a  general  description  for  the  strategy 
used  in  our  algorithm.  In  the  course  of  running  the 
algorithm  for  finding  an  undirected  spanning  forest [CHIN82 ] , 
a  number  of  7-tree-loops  [HIRS79]7  are  generated.  Each  of 
these  7-tree-loop  is  a  directed  graph  whose  vertices  are 
supervert ices  generated  during  the  previous  iteration  (a 
supervertex  is  a  vertex  in  G  or  a  7 -tree-loop) .  The  edges  of 
these  7-tree-loop  will  be  included  in  the  undirected 

6  We  assume  the  reader  is  familiar  with  the  undirected 
spanning  forest  algorithm.  For  those  who  are  not,  we  refer 
them  to  reference  [CHIN82], 

7  A  7-tree-loop  is  a  directed  graph  in  which  every  vertex 
has  outdegree  1  and  in  which  there  is  exactly  one  cycle  and 
the  length  of  the  cycle  is  2. 
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spanning  forest  and  all  these  edges  are  directed  edges  whose 
directions  are  ignored  by  the  algorithm  in  [CHIN82].  If  the 
only  loop  in  a  /-tree-loop  is  destroyed  by  eliminating  the 
out-going  edge  from  the  smallest-numbered-vertex ,  the 
resulting  graph  is  an  inverted  tree.  As  a  result,  when  the 
loops  of  all  the  7-tree-loops  are  destroyed  in  this  way,  the 
resulting  graph  (built  by  embedding  the  modified  (acyclic) 
7-tree-loops  created  during  one  iteration  into  the  modified 
(acyclic)  7-tree-loops  created  during  the  following 
iteration)  may  well  be  an  inverted  spanning  forest. 
Unfortunately,  this  is  not  the  case  in  general  because  some 
vertices  may  result  in  having  two  fathers.  This  situation  is 
depicted  in  Figure  2.2,  where  a  directed  edge  <a,b>  is 
selected  during  iteration  j+ 1  to  connect  two  supervertices 
Si  and  S2  created  during  iteration  j.  The  two  graphs 
resulting  from  the  two  supervertices  are  inverted  trees. 
However,  since  a  is  not  the  root  r,  of  Si,  a  will  have  two 
fathers  after  Si  and  S2  have  been  included  into  a  single 
supervertex.  Therefore,  the  graph  S}US2  is  not  an  inverted 
tree,  by  definition,  unless  the  directions  of  all  the  edges 
on  the  path  from  a  to  r ,  are  reversed.  The  same  situation 
occurs  in  S2US2  when  the  directed  edge  <crd>  is  selected  to 
connect  S2  and  S3.  To  overcome  this  difficulty,  we  have  to 
reverse  the  directions  of  all  edges  on  the  path  from  a  to  r i 
and  those  on  the  path  from  c  to  r2.  The  array  F+,  described 
in  Section  2,  contains  the  path  from  any  vertex  to  a  root  in 
an  inverted  forest  T ;  hence  we  can  generate  the  array  F+ 
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The  direction  of  every  edge 
on  this  path  is  to  be  reversed 


s2us3- 


Figure  2.2  Adjusting  the  edges  of  1-tree-loops 
to  form  an  inverted  tree 
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covering  both  Si  and  S2 .  By  retrieving  the  ath  row  and  the 
Cth  row  of  F+,  we  can  identify  the  set  of  all  edges  whose 
directions  are  to  be  reversed  in  Si  and  S2  respectively. 


26 


Our  algorithm  runs  in  two  stages. 

Algorithm  DSF 

Stage  .1:  (*  The  first  stage  is  basically  a  modified  version 
of  the  algorithm  for  finding  an  undirected  spanning  forest. 
We  refer  the  reader  to  reference  [CHIN82]  for  the  details.*) 
Execute  the  algorithm  for  finding  an  undirected  spanning 
tree;  during  each  iteration  j,  1 <j<lgn ,  record  the  following 
information : 

a.  Convert  the  forest  of  all  1-tree-loops  generated 
during  this  iteration  into  a  forest  of  inverted  trees 
by  eliminating  the  edge  from  the 

smallest-numbered-vertex  of  each  7-tree-loop  and 
store  the  forest  in  a  vector  Fj  .  (*Note:  This  vector 
acts  as  the  function  F  defined  in  Section  2.*) 

b.  Record  the  'actual'  edges  in  G  establishing  the 
connection  specified  in  F j.  (*  Note:  The  edges 
recorded  in  F j  are  pseudo  edges  which  connect 

' supervert ices ' .  They  do  not  exist  in  G.  However,  for 
each  pseudo  edge,  there  exists  a  corresponding  actual 
edge  in  G.*) 

c.  The  vector  D[1..n]  generated  during  this  iteration 
is  stored  as  D} .  (*  Note:  Dj[v]  is  the  supervertex 
containing  vertex  v  when  iteration  j  is  completed.*) 


Stage  2: 

1.  Generate  F]'s  from  F j,  1<j<lgn. 

2.  (*  Adjust  the  directions  of  the  edges,  starting  from 
those  recorded  during  iteration  lgn,  gradually  down  to 
those  recorded  during  iteration  1.*) 


/?'  :  =  {veV  | Di  9  n  [v]=v)i 

(*  Note:  In  the  following  for  loop,  /?'  contains  the  tails 
of  those  actual  edges  in  G  which  connect  two  supervert ices 
in  the  inverted  trees  generated  during  iteration  /,  where 
j</<lgn.  It  includes  all  those  vertices  which  have  two  or 
more  fathers  in  the  directed  graph  formed  upon  the  inverted 
trees  *). 

for  j:=lqn  downto  1  do 
begin 

i  )  For  every  r '  e/?'  , 

reverse  the  direction  of  every  'pseudo'  edge  lying 
on  the  path  from  the  supervertex  Dj[r']  to  the 
root  of  the  inverted  tree,  in  Fj,  containing 
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Dj[r?]; 

ii)  Output  all  the  ’actual'  edges  in  G  corresponding  to 

the  pseudo  edges  in  F} ;  r 

iii)  R' =R' UiveV \v  is  the  tail  of  an  'actual'  edge 

output  in  step  ii)} 

end;  ■ 

A  complete  example  is  given  in  Figure  2.3  and  a 
detailed  implementation  using  the  method  described  above  is 
given  in  the  Appendix. 

Theorem  2.4:  Algorithm  DSF  correctly  generates  an  inverted 
spanning  forest  for  an  undirected  graph. 

Proof:  (Backward  induction)  In  Stage  1,  an  inverted  forest 
F  j  is  correctly  generated  during  each  iteration 
j,  1<j;<lgn[CHIN82  ] .  In  Stage  2,  supposing  that  after 
processing  Fj,jS/<lgn,  an  inverted  forest  F)  is  created. 
Clearly,  F)  and  F i  must  have  the  same  vertex  set  V ■i .  When 
processing  F j-i,  it  should  be  clear  that  there  exists  a  one 
to  one  correspondence  between  the  vertices  in  V }  and  the 
inverted  trees  in  F j-i.  This  implies  that  no  two  instances 
of  r'  in  R'  will  belong  to  the  same  inverted  tree  in  F j-i. 

As  a  result,  after  Step  2  i),  each  inverted  tree  in  F j_i  is 
effectively  modified  so  as  to  root  at  the  supervertex 
Dj-i [r'].  These  modified  inverted  trees  are  then  embedded 
into  the  inverted  forest  F)  in  Step  2  ii),  the  resulting 
directed  graph  F) j-i  is  clearly  an  inverted  forest.  But 
F'iQn=F i<jn  is  an  inverted  forest  initially,  therefore  by 
induction,  F\  must  be  an  inverted  forest  and  hence  an 
inverted  spanning  forest  for  G." 
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Figure  £.3  (i)  G(V,E) 


Figure  2.3(ii)  A  potential  inverted  spanning  tree  of  G. 

- ►  a  directed  edge  selected  during  the  first  iteration; 

— »»  a  directed  edge  selected  during  the  second  iteration; 
— a  directed  edge  selected  during  the  third  iteration. 


29 


Figure  2.3(iii)  An  inverted  spanning  tree  of  G. 
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Theorem  2.5:  Finding  an  inverted  spanning  forest  takes 
0(n/ K+lg2n)  time  with  nK  (K>1)  processors  on  a  PRAM. 

Proof:  Stage  1  takes  0(n/K+lq2n)  time  with  nK(K>1) 
processors [CHIN82 ] .  Since  the  total  number  of  edges  in  the 
inverted  forests  is  at  most 
L] £ i Ln/2 j - 1j  <2n, 

the  creation  of  F  ]  ,  1<j<lgn,  in  Step  1  of  Stage  2  can  be 
done  in  Oin/K+lgn)  time  with  nK(K>0)  processors  (Theorem 
2.3).  Steps  2  ii)  and  iii)  each  takes  0(1)  time  for  each 
iteration.  Since  the  size  of  F ]  ,  1  <j<lgr? ,  is 
j-n/2  j "  1  -]  Xj-n/2  1  ~  1  -]  ,  Step  2  i)  requires 
2,jl?xx«/2J_,iVnKn 

<  lgn  +  L)  i  i j-n/2  r_  ’i  2/nK 
=  0(n/K+lgr?)  time  for  lgn  iterations. 

Hence  the  theorem." 

Note  that  the  processor-t ime  product  is  0(n2),  when 
1  <K<j-n/lg2r?-|  ,  the  algorithm  is  thus  optimal  for  dense 
graphs . 

2.4  Finding  the  Lowest  Common  Ancestors  of  q  Vertex  Pairs 
a  Directed  Tree 

As  with  the  inverted  spanning  forest  algorithm,  the 
lowest  common  ancestor  algorithm  presented  in  this  section 
plays  a  key  role  in  the  development  of  parallel  algorithms 
for  other  graph  theoretic  problems  to  be  discussed  in  the 
following  sections.  The  previous  best  algorithm  was  due  to 
Savage  and  Ja ' Ja ' [ SAVA8 1 ] .  Their  algorithm  first  computes 
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the  transitive  closure  of  the  adjacency  matrix  of  the 
directed  tree,  and  then  uses  the  transitive  closure  to 
determine  the  set  of  all  common  ancestors  of  every  vertex 
pair.  The  min$  operation  is  then  applied  over  each  set  of 
common  ancestors  to  determine  the  lowest  common  ancestors 
for  all  the  vertex  pairs.  Since  there  are  at  worst  0(n2) 
vertex  pairs  and  each  takes  n/2  processors  to  evaluate  the 
min<  operator,  this  algorithm  requires  0(n3)  processors  to 
achieve  the  0(lg2n)  time  bound. 

In  this  section,  we  shall  show  that  we  can  combine  the 
array  F+  described  in  Section  2.2.2  and  the  binary  search 
technique  to  develope  a  new  algorithm  for  the  lowest  common 
ancestor  problem  which  takes  at  worst  n2  processors  to 
achieve  the  O(lgn)  time  bound. 

Let  T(V' , E' )  be  a  directed  tree  and  V' = { 1 , 2 , . . . , n] .  Let 
a  and  b  be  a  pair  of  vertices  and  c  is  their  lowest  common 
ancestor;  then  row  a  and  row  b  of  F+  will  have  identical 
contents  between  column  (n- 1 ) -depthlc]  and  column  n~ 1, 
inclusive,  and  will  have  different  contents  in  the  other 
columns.  As  a  result,  to  determine  c,  we  can  perform  a 
binary  search  on  row  a  and  row  b  simultaneously  in  the 
following  way:  if  the  two  entries  being  examined  in  row  a 
and  row  b  (in  the  same  column,  of  course)  are  different,  the 
search  is  continued  on  the  right-half,  otherwise  it  is 
continued  on  the  left-half.  It  takes  O(lgn)  time  units  to 
find  c  with  one  processor.  In  general,  we  have: 
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Theorem  2.6:  Given  g  vertex  pairs,  1 <q<n2  ,  finding  the 
lowest  common  ancestors  for  these  vertex  pairs  takes 
0(  jq/nK-]  .  lgn+n/K)  time  on  a  PRAM  if  nK(K>0)  processors  are 
ava i lable . 

Proof:  Constructing  the  array  F+  takes  0(n/K+lgn)  time 
(Theorem  2.3)  and  finding  the  lowest  common  ancestors  of  the 
q  vertex  pairs  takes  rg/nK-|.lgn  time  units,  if  nK<g<n2  (Lemma 
2.1)  or  lgn+1  time  units,  if  nK>g.  Thus  finding  the  lowest 
common  ancestors  of  g(1<g<n2)  vertex  pairs,  takes 
0(rq/nK-\  .  lgn+n/K)  time  with  nK(K>0)  processors." 

A  detailed  description  of  this  algorithm  is  given  in 
the  Appendix  (see  Algorithm  LCA) .  In  particular,  when  K =  n 
and  j-n/lgn-i  ,  the  lowest  common  ancestors  can  be  found  in 
O(lgn)  and  0(lg2n)  time  respect ively . 

2.5  Finding  all  Fundamental  Cycles  of  an  Undirected  Graph 

Without  loss  of  generality,  we  assume  that  the 
undirected  graph  G(V ,E)  is  connected  from  this  section 
onwards,  unless  otherwise  stated. 

It  is  known  that  a  set  of  fundamental  cycles  of  a 
connected,  undirected  graph  G(VrE)  can  be  determined  from  a 
spanning  tree  T (V ,£' )  of  G  [REIN77].  Specifically,  let 
LCA(a,b)  be  the  lowest  common  ancestor  of  a  and  b  in  T  and 
( a,b )  is  an  edge  in  G~T ,  then  ( a,b )  together  with  the  paths 
[b*->LCA(a  rb)  ]  and  [LCA  (a,b)*+a]  form  a  fundamental  cycle. 

Based  on  the  above  observation,  we  can  easily  find  a 
set  of  fundamental  cycles  of  G  as  follows: 
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First,  an  inverted  spanning  tree  T  of  G  is  found,  using 
the  algorithm  presented  in  Section  2.3  which  takes 
0(n/K+lq2n )  time  with  r?K(K>1)  processors.  Algorithm  LCA  (see 
Appendix)  is  then  called  to  determine  the  lowest  common 
ancestor  for  every  pair  of  vertices  ( a,b )  in  G~T .  The 
algorithm  returns  the  ordered  pair  (LCA+,F+)  and  the  vector 
depth,  where  LCA+[a,£>]  contains  the  lowest  common  ancestor 
of  ia,b).  A  vector  P+  is  then  created  such  that  P+[\/] 
contains  the  value  (rt~  1 ) ~depth[v]  which  is  the  column  number 
of  v/  in  F +  .  Hence,  for  each  (a,b )  in  G~T ,  the  path  from 
column  P+ [a]  to  column  P+ [LCA+ [a,b] 3  in  row  a  and  the  path 
from  column  P *[b]  to  column  P+ [LCA+ [a,b] ]  in  row  b  of  F+  and 
the  edge  ( a,b )  determine  a  fundamental  cycle  in  G. 

The  correctness  of  the  algorithm  is  easily  verified. 
Since  the  number  of  vertex  pairs  q=\E\-\E'  |,  the  algorithm 
obviously  takes  0(  j- 1 E  |  /nKn  .  lgn+n/K+lg  2n )  time  with  r?K(K>1) 
processors.  In  particular,  the  0(lg2n)  time  bound  is 
achieved  with  K=r?/lgn.  Note  that  the  output  of  the  algorithm 
are  stored  in  an  0(n2)  compact  data  structure,  which 
consists  of  the  triple  (P+ ,LCA+ ,F+ ) . 

At  this  point,  it  is  interesting  to  note  that  the  best 
sequential  algorithm  for  the  fundamental  cycle  problem  has 
time  complexity  0(n3 ) [REIN77 ] .  Our  algorithm  presented  here 
immediately  implies  a  sequential  algorithm  having  time 
complexity  0in2lgn) .  While  our  performance  is  better,  we  do 
not  intend  to  claim  that  it  is  an  improvement  over  the 
previous  result.  This  is  because  the  output  data  structures 
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of  the  two  sequential  algorithms  are  substantially 
different.  However,  for  cases  where  the  fundamental  cycle 
algorithm  is  used  as  an  internal  routine,  our  algorithm  will 
be  better  as  it  requires  less  time  and  space. 

2.6  2-coloring  an  Undirected  Graph 

No  previous  work  was  reported  for  this  problem  except 
[JAJA82]  in  which  the  0(lg2n)  time  and  n2  processors 
complexities  were  mentioned.  However,  the  description  of  the 
algorithm  was  not  given.  In  this  section,  we  shall  present 
an  efficient  algorithm  which  achieves  the  0(lg2n)  time  bound 
using  only  n  rn/lg2r?-|  processors.  We  first  prove  a  lemma. 

Lemma  2.7:  An  undirected  graph  G(V,E)  is 

2-colorable ( bipart i te )  iff  it  has  no  fundamental  cycles  of 
odd  length. 

Proof:  The  'only  if'  part  is  immediate  from  the  well-known 
property  of  bipartite  graphs,  namely  an  undirected  graph  is 
bipartite  iff  it  has  no  cycle  of  odd  length. 

Let  G  has  no  fundamental  cycles  of  odd  length  and  C  be  any 
cycle  in  G.  There  exists  a  set  of  fundamental  cycles  r  such 
that  C=hJr  [REIN77  ]  .  Consider  two  fundamental  cycles  C  i  and  C2 
in  T.  Let  C '  =C i tl/C 2  and  ^{C\)  denotes  the  length  of  C\. 
Clearly,  /(C' )=/(Ci )+^(C2 )-2*/(CiAC2 ) ,  where  A  denotes  'set 
intersection’  here.  Since  4 (C ^) ,  ^(C2),  2*^(CiAC2)  are  all 
even,  /(C')  has  to  be  even.  A  simple  induction  will  reveal 
that  C=&r  is  an  even  cycle.  ■ 
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From  Lemma  2.7,  we  immediately  have: 

Corollary  2.8:  Let  T  be  an  inverted  spanning  tree  of  G.  G  is 
2-colorable (bipart ite)  iff  for  any  edge  e  in  G~T,  one  end 
vertex  of  e  must  be  of  even  depth  while  the  other  is  of  odd 
depth . 

Qur  algorithm  is  based  on  Corollary  2.8.  The  input  to 
the  algorithm  is  an  adjacency  matrix  of  the  undirected  graph 
G(V,E).  First,  an  inverted  spanning  tree  T  of  G  is 
constructed.  A  flag  is  then  associated  with  every  vertex 
pair  in  VxV.  This  flag  is  set  to  true  initially.  Then  for 
every  non-tree  edge  ( u,v )  in  G~T ,  the  condition  :  "Is  one  of 
the  depths  of  u,  v  odd  while  the  other  is  even?"  is  tested. 
If  the  answer  is  negative,  then  the  associated  flag  will  be 
set  to  false.  After  this  step,  all  the  flags  are  anded 
together.  G  is  bipartite  iff  the  result  is  true.  If  G  is 
bipartite,  then  the  vertex  set  V  is  partitioned  into  \/ 1  and 
V2.  This  can  be  accomplished  by  sorting  the  set  of  ordered 
pairs  {(' depthiv )  is  odd*  ,v)  |  vel/} . 

Algorithm  Bipartite: 


1.  Construct  an  inverted  spanning  tree  T  for  G(V ,E) . 

2. (i)  for  all  ( u,v)eVxV  pardo  f 1 ag[u ,v] : =true  dopar; 

(ii)  for  all  ( u ,v )  in  G~T  pardo 

f lag[u,v] : = idepthiu]  is  odd ) /\ ( depth [ v 3  is  even) 

V  Idepthiu ]  is  eve n )/\( depth [v]  is 
odd) 

doparj 

3.  (i)  Bi  part  ite:=/\-,  ,  iflag[  i  ,j] 

(ii)  if  Bipartite  then 

begin 

V,  :  =  {v\depth[v]  is  even},* 
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V 2  :={v\depth[v]  is  odd} 
end;  ■ 

Theorem  2.9  Algorithm  Bipartite  takes  0( n/K+lg 2 n )  time  with 
nK(K>1)  processors  on  a  PRAM. 

Proof:  With  nK(K>1)  processors,  Step  1  takes  0(n/K+lg2n) 
time(Theorem  2.5).  By  Lemma  2.1,  Steps  2 ( i )  and  2 ( i i )  take 
0(n/ K)  time.  Step  3(i)  takes  0(n/ K+lgK)  time  units  (Lemma 
2.2).  Step  3 ( i i )  takes  at  most  0( lgn • lglgn)  t ime [ B0R082 ] . 

The  theorem  thus  follows." 

2.7  Finding  the  HLCA(u)'s 

2. 7..1  Motivation  and  Definition 

In  the  following  sections,  the  set  of  fundamental 
cycles  of  G  plays  an  important  role  in  developing  optimal 
algorithms  for  the  bridge-connectivity  and  biconnectivity 
problems.  As  a  result,  the  efficiencies  of  these  algorithms 
rely  on  how  well  we  can  manipulate  the  fundamental  cycles. 

To  prevent  any  fundamental  cycle  from  being  considered 
excessively,  we  associate  with  each  of  them  exactly  two 
vertices  and  consider  it  only  at  those  two  vertices.  These 
two  vertices  are  determined  as  follows:  let  T  be  an  inverted 
spanning  tree  on  which  the  fundamental  cycles  are  generated 
and  C  be  any  of  the  fundamental  cycles.  The  two  vertices 
associated  with  C  are  the  end-vertices  of  the  non-tree  edge 
determining  C.  With  this  strategy,  every  fundamental  cycle 
is  considered  exactly  twice. 
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Let  u  be  any  vertex  in  G.  We  find  the  highest  vertex  in 
T  which  can  be  reached  from  u  via  a  fundamental  cycle 
associated  with  u.  This  vertex  is  clearly  an  ancestor  of  u. 
Furthermore,  all  the  edges  on  the  closed  path  from  u  to  this 
vertex  are  guaranteed  to  lie  within  the  same  fundamental 
cycle  while  edges  which  lie  below  u  or  above  this  vertex  may 
or  may  not  have  this  property.  We  denote  this  vertex  with 
HLCA(u)(the  prefix  H  stands  for  the  highest).  A  precise 
definition  is  given  below. 

Definition:  Let  G(VrE)  be  an  undirected  graph  and  T{V,E')  be 
its  inverted  spanning  tree.  Let  ueV ,  HLCA(u) =LCA (u, v)  where 
(u,v)eE-E'U{(u,u)}  and  depth  (LCA  (u,\/)  )  <depth  (LCA  (l/,  V'  )  )  , 

V(u ,v' ) eE~E' U{ (u ,u)} . 

Figure  2.4  illustrates  HLCA(u) .  The  solid  lines  and 
circles  represent  the  edges  and  vertices  of  an  inverted 
spanning  tree  of  an  undirected  graph.  The  dotted  lines 
represent  the  edges  in  the  graph  G~T  emerging  from  a 
particular  vertex  u. 

To  compute  HLCA (u) , VueV ,  we  may  first  use  the  lowest 
common  ancestor  algorithm  to  find 

LCA iu ,v) ,V (u ,v) eE~E ' U{ (u ,u) }  and  then  apply  Lemma  2.2  to 
find  HLCA(u) ,VueV.  However,  in  doing  so,  we  will  require 
0(x I E-E'  |  /nK-j  .  lgn+n/K)  time  if  nK(K>0)  processors  are 
available.  In  this  section,  we  show  a  way  of  finding 
HLCA  (u)  ,  VueV  in  0(n/K+lgr?  *  lglgn)  time  with  /7K  ( lgn>K>  1 ) 
processors  or  in  0(n/K+lgr?)  time  with  nK(K>lgn)  processors. 
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Figure  2.4 

An  illustration  of  HLCA(u) 


39 


This  method  allows  us  to  design  optimal  parallel  algorithms 
for  the  graph  theoretic  problems  discussed  in  the  following 
sections . 

2.7.2  Computing  the  HLCA(u)'s  Based  on  Preorder  Numbering 

The  method  is  based  on  the  preorder  numbering  of  the 
vertices  in  an  ordered  spanning  tree  T(V,E')  of  G.  We  denote 
the  preorder  number  of  a  vertex  v  by  pre(v) . 

Lemma  2.10:  Let  u,veV.  Then  vSu  iff 

pre(v)<pre(u)<pre(v)+nd{v)  where  nd(v)  is  the  number  of 
descendants  of  v. 

Proof:  Immediate  from  the  definition  of  preorder  traversal." 

Lemma  2.11:  Let  (u,v) , (u, w) eE~E' . 

(i)  if  pre(v)<pre(iA/)<pre(u) , 

then  depth  {LCk(u  rv)  )  <depth  (LCA  (u  ,ia/)  )  } 

(ii)  if  pre(v)>pre(iA/)>pre(u)  r 

then  dept h ( LCA (u,v) ) <depth ( LCA {u r w) ) . 

Proof:  (i)  By  Lemma  2.10,  pre ( LCA ( u, v ) )<pre(v)  and  pre(u) 
<pre(LCA{u rv) )+nd{LCh{u ,v) ) .  Therefore  pre(LCk(u,v) )<pre(w) 
<pre(LCk(u rv) )+nd(LCA(u ,v) ) .  By  Lemma  2.10,  LCk(u ,v)Sw . 
Hence,  depth  (LCA  (u,\/)  )  ^depth  ( LCA  (u,  iv)  )  .  Part  (ii)  can  be 
proved  similarly." 

Lemma  1 1  points  out  that  we  can  reduce  the  problem  of 
finding  HLCA(l/)  to  that  of  finding  the  lowest  common 
ancestor  of  two  particular  vertices  in  {v \ (urv) eE~E' }U{u] . 


7 

* 


. 


■ 


40 


Definition:  Let  UeV ,  W={v\ (u ,v) eE~E' }U{u] . 

pmax(u) =v ,  where  veVJ  and  pre(v)  >pre(w)  ,VweU/;  r 

pmin  {u)=v,  where  veW  and  pre(v)  <>pre(w)  ,VweW . 

Corollary  2.12: 

HLCA(l/)  =  (min<)  {LCA  (u,  pmin  (u)  )  ,  LCA  {u ,  pmax  iu)  )  }  . 

Proof:  Immediate  from  Lemma  2.11." 

Corollary  2.43:  HLCA (u) =LCA ( pmin (u) , pmax iu)  )  . 

Proof:  From  Corollary  2.12,  HLCA (u) $pmin (u)  and 
HLCA (u) $pmax (u) .  Thus,  HLCA (u) ^LCA ( pmin (u) , pmax (u) ) . 

By  definition,  pre(pmin (u) ) ^pre(u) ^preipmax (u) ) .  This 
implies  pre( LCA (pmin (u) , pma x(u) ) )<pre(u) 

<pre (LCA (pmin (u)  , pma x(u) )  )+n<d(LCA(pmin(u)  , pma x(u)  ) )  .  By 
Lemma  2.10,  LCA (pmin (u) , pmax (u) ) Su .  Therefore 
LCA(pmin(u) ,  pmax  (tv) ) $LCA (tv , pmin (tv) )  and 
LCA(pmin(tv) ,pmax(tv) ) ^LCA (tv,  pmax  (tv) ) .  By  Corollary  2.12, 

LCA (pmin (tv) , pmax (tv) ) ^HLCA (tv) .  ■ 

Lemma  2.44:  Let  T (V ,E' )  be  a  directed  tree  whose  vertices 
have  been  labelled  in  preorderj  then  finding  HLCA(tv),  Vue  V , 
can  be  done  in  0(n/K+lqn)  time  with  nK(K>1)  processors  on  a 
PRAM. 

Proof:  To  compute  pma x(u)  and  pmin (tv),  VueV ,  we  need 
0(n/K+lgK)  time  with  nK(K>1)  processors  ( Lemma  2.2),  and  to 
find  HLCA iu) ,  VueV ,  we  need  to  find  the  lowest  common 
ancestors  of  the  n  (pmin (u) , pmax iu) )  pairs.  This  takes 
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0(n/K  +  lgn)  time  with  r?K(K>0)  processors  (Theorem  2.6).» 

Figure  2.4  gives  an  illustration  to  the  above  lemmas 
and  corollaries.  The  numbers  in  the  circles  are  the  preorder 
numbers  of  the  vertices.  For  instance,  the  preorder  number 
of  u  is  21.  For  convenience,  we  name  each  vertex  by  its 
preorder  number.  It  can  be  easily  checked  that 
depth (LCA ( u,  12)  ) <mi n( depth (LCA({_y,  18)  )  , depth (LCA (u,  16)  )  )  ,  and 
that  depth(LCA  (iy,  28)  )  <depth  (LCA  (i;,  24)  )  .  Furthermore, 
pmin(i/)  =  12,  pmax  (u)  =28 ,  and  LCA(12,28)  =  3  which  is  clearly 
HLCA(l/)  . 

2.7.3  Computing  the  Preorder  Numbers 

The  crucial  step  in  computing  HLCA (u) ,VueV ,  is  to 
determine  the  preorder  numbers  efficiently.  The  common  way 
of  numbering  the  vertices  of  a  tree  in  preorder  is  to 
traverse  the  tree.  However,  this  will  result  in  an  0(n)  time 
algorithm  which  is  undesirable.  In  the  following  lemma,  we 
show  that  we  can  carry  out  preorder  numbering  in  parallel  by 
computation  rather  than  by  traversing  the  tree. 

Lemma  2.1 5 :  Let  T{V,E')  be  an  ordered  tree.  For  each  veV , 
pre(v)=LsLtnd(t)+na(v) ,  seANC(v);  teEBRQ(s); 

=  Lsnds(F  is)  Jrank(s)-'\)  +  '\+depth(v) seANC  (v)  -  { r* } . 
where  ANC(vO  is  the  set  of  all  ancestors  of  v\ 

EBRQ(s)  is  the  set  of  all  elder  brothers  of  s; 
nd(t)  is  the  number  of  descendants  of  tj 
na(v)  is  the  number  of  ancestors  of  v ; 
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nds ( V ,  j )  is  the  total  number  of  descendants  of  the 
first  j  sons  of  v, 

and  rankis)  is  the  rank  of  s,  i.e.  the  position  of  s  among 
all  its  brothers. 

Proof:  Trivial." 

Let  us  consider  the  inverted  spanning  tree  given  in 
Figure  2.4  again.  Consider  the  vertex  u,  pre(u)= 21,  the 
ancestors  of  u  are  the  vertices  21,  17,  15,  7,  3  and  1.  The 
number  of  descendants  of  the  elder  brothers  of  each  of  these 
vertices  except  the  root  are  3,  1,7,  3,  and  1  respectively. 
These  numbers  sum  up  to  15.  The  number  of  ancestors  of  u  is 
6,  this  gives  rise  to  a  total  sum  of  21,  which  is  the 
preorder  number  of  u. 

Using  Lemma  2.15,  we  want  to  show  that  the  preorder 
numbers  pre(v) ,VveV  can  be  determined  in  0(n/K+lgn • lglgn) 
time  with  r?K(K>1)  processors.  Assuming  that  an  inverted  tree 
7  represented  by  an  array  T[ 1 . . 2 , 1 . .n]  such  that 
{ <T  [1,  /  ]  ,  T  [  2 ,  /  ]  >  |  IS  /  <r? }  =E '  is  given  (  Specifically,  T  C  1  ,  /  ]  =  / 
and  7 [ 2 ,  /  ] =F (  / ) ,  1 <i<n.  We  assume  T[2,r]=0  for  the  root  r) . 

Algorithm  Preorder: 

Step  1 :  Compute  the  array  F+  and  the  vector  depth  for 
Ti 

Step  2:  Order  the  sons  of  every  vertex  in  7,  i.e. 
compute  rank(v) ,VveV } 

Step  3:  Find  ndsiv , j )  rVveV ,  1 <j<n(v) ,  where  niv)  is  the 
number  of  sons  of  v\ 

Step  4:  Compute  preiv) , VveV .  ■ 

Lemma  2.J16:  Algorithm  Preorder  takes  0(n/K+lgn*  lglgr?)  time 
with  r?K(lgn>K>1)  processors  or  in  0(n/K+lqn)  time  with 
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nK(K>lgr?)  processors  on  the  PRAM. 

Proof:  Step  1  can  be  done  in  Oin/K+lgn)  time (Theorem  2.3). 

In  Step  2,  the  ordered  pairs  {<T[ 2 , / ] ,T[ 1 ,  i  ]> | 1 < i^n]  are 
sorted.  This  can  be  done  in  0(lgn*lglgn)  time  with  n 
processor  or  in  O(lqn)  time  with  nlgn  processors  [B0R082  ]  . 
Assuming  that  the  sorted  T  is  stored  in  T' [ 1 . . 2 , 1 . .n] ,  then 
T'  can  be  divided  into  segments  such  that  in  each  segment, 
the  first  row  contains  the  same  vertex  v  in  every  entry,  and 
the  second  row  contains  the  set  of  all  sons  of  v  in  T.  The 
relative  position  of  vertex  /  in  the  second  row  of  the 
segment  in  which  /  resides,  is  the  rank  of  /,  i.e.  ranki i ) . 

In  step  3,  nd(v) ,Vve V,  are  first  computed  by  scanning 
the  ( in- 1 ) -depthiv) ) th  column  of  F+  and  counting  the  number 
of  occurrences  of  \/.  By  Lemma  2.2,  this  takes  Oin/ K+lgK) 
time.  After  this,  ndsiv , j) ,VveV , iv) ,  are  computed  using 
the  following  formula: 

nds{v,j)=Z\  = ,nd(Si ) ,  1 <j<n(v) . 

It  has  been  shown  in  [KOGG73]  that  the  partial  sums 
I|  =  1a i,  ‘ \<j<rij  can  be  computed  in  O(lgn)  time  if  n 
processors  are  given.  Since  for  each  vertex  v/,  v  has  niv) 
sons,  the  time  needed  to  compute  ndsiv r j) , (v) ,  is 
0{lg(n(v)))  if  niv)  processors  are  assigned  to  v.  (This  is 
possible  if  we  make  use  of  the  sorted  array  T' ) .  As  a 
result,  all  these  partial  sums,  ndsiv , j)  ,  '\<j<n(v)  ,VveV ,  can 
be  computed  in  parallel  in  max {Oi lg (niv) ) ) }  =0(lgn)  time 
with  Lniv)  =n- 1  processors. 
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Finally,  in  step  4,  pre(v)  rVveV  is  computed  using  the 
formula  given  in  Lemma  2.15.  We  assume  ndsiv , 0)=0 ,VveV .  Note 
that  ANC (v)  is  available  in  the  vth  row  of  F+  starting  from 
column  {n~ 1 ) -depth (v)  to  column  {n- 1),  and  na(v)  equals 
depth(v/)  +  1.  By  Lemma  2.2,  this  takes  0(n/K  +  lgK)  time. 

Summing  up,  pre(v) ,VveV  can  be  determined  in 
0(n/K+lgn* lglgn)  time  with  r?K(lgr?>K>1)  -processors  or  in 
0(n/K+lgn)  time  with  nK(K>lgn)  processors. ■ 

2.7.4  Conclusions 

Theorem  2.17:  Computing  HLCA (u) ,  VueV  can  be  done  in 
0(n/K+lgn*  lglgn)  time  with  nK(lgr?>K>1)  processors  or  in 
0(n/K+lgn)  time  with  r?K(K>lgn)  processors  on  the  PRAM. 

Proof:  Lemmas  2.14,  2.16." 

Remark : 

Since  the  first  write-up  of  our  algorithm  for  computing 
preorder  numbers [TSIN82a ] ,  we  have  discovered  that  Schwartz 
described  a  method  for  computing  preorder  numbers  on  the  PSN 
which  is  similar  to  ours [ SCHW80 ] . 

2.8  The  Bridge-connectivity  Problem 

2. 8.-1  Introduction 

The  previous  best  algorithm  for  finding  the  bridges  in 
an  undirected  graph  on  the  PRAM  first  appeared  in  [SAVA77], 
It  was  then  reported  in  [ SAVA8 1 ] .  This  algorithm  achieves 
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the  0(lg2n)  time  bound  with  n2lgn  processors.  In  this 
section,  we  present  an  optimal  parallel  algorithm  which 
achieves  the  0(lg2n)  time  bound  using  only  nrn/lq2n-] 
processors. 

The  bridge-connectivity  problem  consists  of  two 
subproblems,  namely  finding  the  bridges  and  determining  the 
bridge-connected  components  of  an  undirected  graph.  We 
consider  the  problem  of  finding  the  bridges  first. 

2.8.2  Finding  All  the  Bridges  in  an  Undirected  Graph 

The  efficiency  of  our  algorithm  relies  on  the  following 
Lemmas . 

Lemma  2.18:  Let  G(V ,£)  be  a  connected,  undirected  graph.  If 
e= (a,b) eE  is  a  bridge  of  G,  then  every  inverted  spanning 
tree  of  G  contains  either  <arb>  or  <b,a>. 

Proof:  Trivial." 

Lemma  2.19:  e  is  not  a  bridge  iff  e  is  on  a  fundamental 
cycle . 

Proof:  Immediate  from  the  definition  of  bridges." 

The  input  data  is  again  assumed  to  be  an  adjacency 
matrix  of  GiV,E ).  By  definition,  an  edge  e  is  a  bridge  in  G 
iff  e  is  not  contained  in  any  cycle  in  G.  Since  there  are  a 
total  of  |£|  edges  and  a  possible  exponential  number  of 
cycles  in  G,  basing  our  algorithm  to  find  the  set  of  all 
bridges  on  the  definition  may  require  an  unmanageable  number 
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of  operations.  Fortunately,  thanks  to  Lemmas  2.18  and  2.19, 
we  need  only  consider  those  edges  in  an  inverted  spanning 
tree  of  G  and  the  fundamental  cycles  generated  from  that 
spanning  tree.  This  allows  us  to  start  with  a  manageable 
size  of  edges  and  cycles. 

Let  T(V,E')  be  an  inverted  spanning  tree  of  G  and 
<a,F(a)>=ee£’ .  We  shall  show  below  (Theorem  2.20)  that  e  is 
a  bridge  iff  e  is  not  included  in  the  same  fundamental  cycle 
as  any  descendant  of  a  in  T.  In  other  words,  e  does  not  lie 
on  any  of  the  paths  [  7-*->HLCA  (  / )  ]  where  /  is  a  descendant  of 
a  in  T.  Using  this  characteristic  of  bridges,  we  can  find 
all  the  bridges  efficiently. 

Theorem  2.20:  Let  T {V ,E' )  be  an  inverted  spanning  tree  of  a 
connected,  undirected  graph  G,  and  e=<a , b>eE' . 

( a,b )  is  a  bridge  of  G  iff  for  each  descendant  /  of  a,  there 
does  not  exist  (/,j)  in  G~T  such  that 
depth ( LCA [ / , j ] ) <depth ( a ) . 

Proof:  Let  e=<a ,b>eE'  be  such  that  (a,b )  is  a  bridge  in  G. 

If  there  exists  ( i,j )  in  G~T  such  that  /  is  a  descendant  of 
a  in  T  and  depth(LCA[ / , j] )<depth(a) ,  then  the  path 
[  / — >j*^LCA[  ijj  ]*->b — >a*->  /  ]  is  a  cycle  containing  e.  This 
leads  to  a  contradiction  by  Lemma  2.19. 

Conversely,  if  e=(a,b)  is  not  a  bridge,  then  by  Lemma  2.19, 
e  is  on  a  fundamental  cycle  C,  i.e.  there  exists  ( i  ,j )  in 
G~T  such  that 


C  :  [/-->•  LCA [  /  ,_/]*->/] 
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e*(i ,j)  because  e  is  not  in  G~T  (Lemma  2.18).  As  a  result,  e 
is  either  onr the  path  [ j — >LCA[i,j ]]  or  on  the  path 
[LCA[i,j] — ►  /],  implying 

depth ( j ) > depth ( a ) >depth ( b ) > depth ( LCA [ / , j ] )  or 
depthi i )>depthia)>depth(b)>depth(LCA[ i ,j]) .  Hence  in  either 
case  there  exists  (  i  ,j )  in  G~T  such  that  /  is  a  descendant 
of  a  and  depth(LCA[ i , j] )<  depth(a) .■ 

Algorithm  Bridges: 

1.  Construct  an  inverted  spanning  tree  T(V,E ’)  for  G(V,E). 

2.  Compute  HLCA(u),  VueV . 

3.  Compute  aiu) ,  VueV ,  where 

a(L/)=min{depth(HLCA(w)  )  |  uSw]  . 

4.  For  each  <urF (u) >eE' ,  check  if  depth (u) (u) .  (*  (u,F(u)) 
is  a  bridge  iff  depth  (ty)  <a  (u)  *)  ■ 

The  complexities  of  Algorithm  Bridges  is  analyzed 

below . 

Theorem  2.21:  Algorithm  Bridges  runs  in  0(n/K+lg2n)  time 
with  nK(K>1)  processors  on  a  PRAM. 

Proof:  With  nK(K>1)  processors,  step  1  takes  0(n/K+lg2n) 
time  (Theorem  2.5).  Step  2  takes  0( n/K+lgn • lglgn)  time 
(Theorem  2.17).  By  using  the  array  F+  for  T{V,E')r  Steps  3 
and  4  takes  0(n/K+lgK)  time(Lemmas  2.1  &  2.2).  Hence, 
algorithm  Bridges  runs  in  0(n/K+lg2n)  time  with  nK(K>1) 
processors . ■ 

2.8.3  The  Bridge-connected  Components  of  an  Undirected  Graph 

Once  the  bridges  of  a  connected,  undirected  graph  are 
determined,  its  bridge-connected  components  can  be 
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determined.  Specifically,  we  eliminate  all  the  bridges  in  G 
and  then  use  Algorithm  MOD. CONNECT [ CHI N8 1 ,CHIN82 ]  to  find 
the  connected  components  of  the  resulting  graph.  Each  of  the 
connected  components  thus  found  is  a  bridge-connected 
component  of  G. 

The  algorithm  obviously  runs  in  0(r?/K+lg1 2 3 n)  time  with 
nK(K>1)  processors  on  a  PRAM. 

2.9  The  Bride-connectivity  Augmentation  Problem 

No  previous  result  was  reported  in  the  literature  for 
this  problem.  The  algorithm  presented  here  is  a  parallel 
version  of  Eswaran  and  Tarjan's  sequential 

algor ithm[ESWA76 ] .  We  list  their  algorithm  below  and  refer 
the  reader  to  the  reference  cited  for  its  correctness.  Note 
that  the  undirected  graph  G  may  be  disconnected  in  this 
section . 


Algorithm  Brconnect [ESWA76 ] : 

(*  Given  an  undirected  graph  G(V ,E) ,  add  the  minimum  number 
of  edges  to  G  so  that  the  resulting  graph  is 
bridge-connected  *) 


1.  Find  the  bridge-connected  components  of  G } 

2.  Condense  G  into  an  acyclic  graph  G0 (V0 ,E0)  by  collapsing 

each  bridge-connected  component  of  G  into  a  single 
vertexj 

3.  Construct  an  edge  set  A\  to  connect  the  trees  of  G0  so 

that  the  resulting  graph  T0(Vo,E0UAi)  is  an  undirected 
tree.  is  defined  as  follows: 

Let  [v(  i )  |  1  <  / < 2/77}  be  a  set  of  vertices  of  G0  such 
that 

(i)  v(2i~'\)  and  v{2i)  are  each  a  pendant  or 
an  isolated  vertex  in  the  / th  tree; 

(ii)  v(2i~'\)=v(2i)  iff  the  /th  tree  is  an 
isolated  vertex. 

Then  A  i  =  { ( v  (  2  / ) ,  \/  ( 2  /  + 1 ) )  |  1  <j<m}} 
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4.  Convert  T0  into  an  inverted  tree  for  which  the  root  has 

two  or  more  sons  and  label  its  vertices  with  preorder 
numbers,  then  sort  its  pendants  by  preorder  number; 

5.  Construct  an  edge  set  A2  to  bridge-connect  T0,  where  A2 

is  defined  as  follows: 

Let  (\/(  / )  |  1< j<p]  be  the  sorted  sequence  of 
pendants  where  p  is  the  number  of  pendants. 

A2={ (v( i) ,v( i+ Lp/2J ) ) I 1< i<jp/21 }; 

6.  Let  V i  be  a  set  of  vertices  containing  exactly  one  vertex 

from  each  bridge-connected  component  in  G  and 
■n  i  V  o  — >  Vy  be  a  1-1  correspondence  such  that  ir~'(v) 
corresponds  to  the  bridge-connected  component 
containing  \/.  Define  A  =  { ( n  (u)  ,  n  iv)  )  \  (u ,  v)  eA  yUA2 }  .  (* 

A  is  the  minimum  set  of  edges  bridge-connecting  G  * ) .  ■ 

The  construction  of  the  sets  A i ,  A2  forms  the  main  part 

of  our  algorithm,  we  handle  them  in  the  following  lemmas. 


Lemma  2.22:  Given  an  adjacency  matrix  of  Go(Vo,E0), 
constructing  the  edge  set  Ay  can  be  done  in  CK/77/K+lg 2m)  time 
with  /77K(K>1)  processors,  where  m=\V0\- 
Proof:  First,  find  the  connected  component  of  G0 ,  i.e. 
compute  Civ) ,  VveV0  such  that  C(u)=C(v)  iff  u,v  belong  to 
the  same  connected  component  in  G0 •  This  takes  0(m/K+lq2m ) 
time  with  /77K(K>1)  processors [CHIN8  1  , CHIN82 ]  .  Then  sort  the 
set  {<C(v)  rv>  \ .  This  takes  Oi  Igmlglgm)  time  with  m 
processors [BQR082 ] .  After  that,  assign  one  processor  to  each 
<C(v) ,v> , 1 <v<m ,  and  compare  the  C  value  of  that  element  with 
the  C  value  of  the  following  element,  say  <Ciu) ,u> ,  in  the 
sorted  sequence.  The  processor  will  add  ( u ,v )  to  Ay  iff 
C(u)^Civ).  This  takes  0( 1)  time  with  m-l  processors.  Hence, 
constructing  Ay  can  be  done  in  0(m/K+lg2m)  time  with  /77K ( K >  1 ) 
processors .  ■ 


Lemma  2.23:  Given  an  adjacency  matrix  of  the  undirected  tree 
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T(V0,E'),  the  edge  set  A2  can  be  constructed  in  0(m/K+lq2m) 
time  with  /77K ( K >  1  )  processors,  where  m=\V0\- 

proof:  Find  an  inverted  tree  T b  of  T0  such  that  the  root  has 
more  than  one  son.  This  takes  0(m/K+lq2m)  time  with  /77K ( K>  1 ) 
processors (Theorem  2.5).  Then  label  the  vertices  with 
preorder  numbers  and  identify  the  pendants  as  follows:  sort 
{<F{v)  ,  v>  |  1  <v</77}  .  Clearly,  the  vertices  having  the  same 
father  are  in  consecutive  positions  after  sorting.  To  avoid 
write  conflicts,  only  the  processor  assigned  to  the  leftmost 
vertex  of  each  segment  of  vertices  having  the  same  father  in 
the  sorted  sequence  will  write  a  1  into  an  appropiate  entry 
of  an  array  mark  to  indicate  that  the  father  is  a 
nonpendant.  Consequently,  mark(v)= 1  iff  v  is  a  nonpendant, 
(Each  markiv)  has  the  initial  value  0).  After  that,  sort 
{<markiv) ,v> | 1 ^v^m]  to  seperate  the  pendants  from  the 
nonpendants.  Finally,  sort  the  pendants  in  ascending  order 
by  preorder  by  sorting  the  set  {<pre(v) ,v>  \mark(v)=0} .  Let  p 
be  the  number  of  pendants  and  rank(v)  be  the  position  of  v 
in  the  sorted  sequence.  Add  ( u ,v )  to  A2  if 
rankiu) =rank(v) + Lp/ 2J .  All  these  steps  take  at  most 
O(lgmlglqm)  time  with  m  processors.  ■ 

With  the  help  of  Lemmas  2*22  and  2.23,  we  are  ready  to 
analyze  the  performance  of  Algorithm  Brconnect. 

Theorem  2.24:  Algorithm  Brconnect  runs  in  Oin/K+lq2 n)  time 
with  nK(K>1)  processors  on  a  PRAM. 

Proof:  In  Step  1,  the  bridges  and  bridge-connected 
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components  of  GiV,E)  are  determined.  This  takes  0(n/K+lg2n) 
time  with  nK(K>1)  processors [CHIN8 1 ,CHIN82 ] .  In  Step  2,  to 
condense  G(V,E)  into  G(V 0 ,E0) ,  we  just  have  to  determine  V0 
and  E0.  Clearly,  V0  can  be  formed  by  adding  to  it  exactly 
one  vertex  from  each  bridge-connected  component  of  G  (each 
of  these  vertices  serves  to  represents  a  bridge-connected 
component).  For  convenience,  we  choose  the  smallest-numbered 
vertex  from  each  component.  As  a  result,  we  immediately  have 
V o={v\v=Civ)}  and  E0=i (C(u) ,C(v) ) | (u,v)  is  a  bridge  of  G1 . 
Note  that  the  array  {C(v) | 1 <v^n]  and  the  bridges  are 
determined  in  Step  1,  and  as  a  consequence,  determining  l/0 
and  E0  takes  0(1)  time  with  n  processors.  Steps  3,4  and  5 
takes  0(n/K+lg2n)  time  with  nK(K>1)  processors  by  Lemmas 
2.22  and  2.23  (note  that  /77<n)  .  Finally,  in  Step  6,  due  to 
the  way  we  construct  Vo,  the  vertices  of  V0  are  also 
vertices  of  V,  therefore,  A=AyUA2  and  no  transformation  is 
required.  Thus  this  step  takes  0(1)  time.  Hence,  the  theorem 
follows . ■ 

2.10  The  Biconnectivity  Problem 

2. -10..1  Introduction 

Like  the  bridge-connectivity  problem,  this  problem  also 
consists  of  two  subproblems,  namely  finding  the  set  of  all 
biconnected  components  and  finding  the  set  of  all  separation 
vertices  in  an  undirected  graph.  The  previous  best  results 
on  this  problem  were  due  to  Savage  and  Ja ' Ja ' [ SAVA8 1 ] .  They 
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presented  two  algorithms;  one  runs  in  0(lq2n)  time  with 
n3/ lgn  processors  while  the  other  runs  in  0(lg2r?lgB)  time 
with  |£|n+n2lgn  processors  where  B  is  the  number  of 
biconnected  components  in  the  graph. 

In  this  section,  the  algorithm  we  present  could  run  in 
0(lg2n)  time  with  only  n  rr?/lg2 r?-|  processors. 

2.10.2  Finding  all  Biconnected  Components  in  an  Undirected 
Graph 

In  this  section,  we  present  an  optimal  parallel 
algorithm  for  finding  all  biconnected  components  of  a 
connected,  undirected  graph  G(V,E).  Since  a  biconnected 
component  can  be  completely  determined  by  its  vertex  set,  it 
suffices  to  find  the  vertex  sets  of  all  the  biconnected 
components  of  G.  Our  algorithm  is  based  on  the  following 
lemma . 

Lemma  2.25:  (i)  For  each  edge  ( a,b)eE  there  exists  a  unique 
biconnected  component  in  G  containing  the  edge. 

(ii)  All  edges  in  the  same  simple  cycle  in  G 
belong  to  the  same  biconnected  component  in 
G. 

(iii)  Let  Ci  and  C2  be  two  simple  cycles  having  an 
edge  in  common.  Then  C i  and  C2  belong  to  the 
same  biconnected  component  in  G. 

The  general  strategy  of  our  algorithm  is  as  follows. 
Given  the  undirected  graph  G,  we  begin  by  constructing  an 
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inverted  spanning  tree  T  of  G.  From  T,  we  generate  a  set  of 
fundamental  cycles  of  G.  From  Lemma  2 . 2  5 ( i )  and  (ii),  every 
fundamental  cycle  falls  entirely  within  a  unique  biconnected 
component.  We  then  use  these  fundamental  cycles  as  the 
building  blocks  and  begin  to  merge  those  cycles  having 
common  edges  into  bigger  circuits.  By  Lemma  2.25(iii) ,  each 
of  these  circuits  belongs  to  exactly  one  biconnected 
component.  We  then  merge  the  circuits  having  common  edges 
into  yet  bigger  circuits.  This  process  is  carried  on  until 
no  further  merge  is  possible.  Then  every  circuit  generated 
contributes  to  a  biconnected  component  in  G. 

To  make  the  fundamental  cycles  easier  to  handle,  we 
remove  the  non-tree  edge  from  each  of  them.  This  is 
legitimate  because  no  two  fundamental  cycles  can  possibly 
intersect  at  a  non-tree  edge.  The  advantage  is  that  the 
number  of  edges  involved  is  now  reduced  from  0(n2)  to  n~ 1 . 
This  modification  also  implies  that  we  are  in  fact 
manipulating  the  branches  of  T  rather  than  the  fundamental 
cycles  and  that  the  process  of  merging  the  fundamental 
cycles  has  become  the  process  of  merging  branches  into 
subtrees.  Consequently,  when  the  merging  process  is 
complete,  the  result  is  a  set  of  trees  each  of  which  is  a 
spanning  tree  of  a  distinct  biconnected  component  of  G. 
Obviously,  the  vertex  sets  of  these  trees  are  the  vertex 
sets  of  the  biconnected  components  of  G. 

The  merging  process  cannot  be  time-consuming  for 
otherwise  the  performance  of  the  entire  algorithm  will  be 
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degraded.  The  method  we  use  in  our  algorithm  is  to  reduce 
the  merging  process  to  the  problem  of  finding  the  connected 
component  of  an  undirected  graph  derived  from  the  inverted 
spanning  tree  T. 

Definition:  Let  T{VrE')  be  an  inverted  spanning  tree  of 
G(VrE).  Let  e,=<a,F(a)> re2=<b,F(b)>eE' .  Then 
Q  i  A£?2 

iff  (i)  e2  is  on  [ a*-+HLCA ( a )  ]  or  is  on 
[b*-*ELCh(b)  3; 

or  (ii)  ( a,b)eE~E '  and  neither  aSb  nor  bSa  in  T. 
From  the  definition,  if  eiAe2  then  ei  and  e2  belong  to 
the  same  fundamental  cycle.  It  is  easily  shown  that  if  eiAe2 
and  e2Ae3 ,  then  ei  and  e3  belong  to  the  same  simple  cycle  in 
G.  This  is  easily  generalized  to: 

Lemma  2.26:  If  eiAe2,  e2Ae3 ,  ......  et-iAet,  then  there 

exists  a  simple  cycle  in  G  containing  both  e,  and  et • 

Definition:  Let  G(V ,E)  be  an  undirected  graph  and  T(VrE ')  be 
its  inverted  spanning  tree.  Then  G"(E',E")  is  an  undirected 
graph  in  which  (ei ,e2)eE"  iff  eiAe2. 

The  following  theorem  establishes  the  relationship 
between  G  and  G". 

Theorem  2.27:  e  and  e'  belong  to  the  same  connected 
component  in  GT'  iff  e  and  e'  belong  to  the  same  biconnected 
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component  in  G. 

Proof:  Let  e  and  e'  belong  to  the  same  connected  component 
in  G" .  Then  there  exists  a  path  :  e,  e,,  ...  ,  et ,  e'  in  6". 
This  implies  that  eAei  and  eiAe2  and  ...  and  etAeT .  By  Lemma 
2.26,  e,  e'  belong  to  the  same  cycle  in  G.  By  Lemma 
2.25(ii),  e  and  e'  belong  to  the  same  biconnected  component 
in  G. 

Let  e  and  e'  belong  to  the  same  biconnected  component  in  G. 
Then  there  exists  a  simple  cycle  C  containing  e  and  e'  in  G. 
Let  T  be  the  set  of  fundamental  cycles  such  that  C=h)T . 
Construct  an  undirected  graph  H(r,E)  such  that  (Ci,C2)eS  iff 
Ci  and  C2  have  a  common  edge.  Clearly,  H  cannot  be 
disconnected  for  otherwise  C  cannot  be  a  simple  cycle.  Let 
P={C i } 1 = i  be  the  shortest  path  in  H  such  that  eeC i,  e' eCt  • 
Let  e-,  be  a  common  edge  of  C-,  and  Ci  +  i,  1^/<t. 

Let  (a-,  ,b,)  be  the  edge  in  G~T  determining  C\  1</<t.  Let 
e(a  j),  e(b-,)  be  the  edges  in  T  such  that  e(a  \  )  =<a  ■,  rF  (a  ■,  )> 
and  e(b  ■,  )  =<b  ■,  rF  (b  ■,  )>  i  then  in  each  C-,,  we  have:  (i) 
e(ai)Ae(£>j)  and  (ei_1Ae(ai)  or  et^Ae{b\))  and  (eiAe(ai)  or 
eiAe(Jbi))i  or  (ii)  ei-iAe(ai)  and  eiAe(ai)i  or  (iii) 
e i  -  1  Ae ( £> i  )  and  eiAe(t>i).  In  any  of  the  cases,  there  is  a 
path  from  to  e-,  in  G” .  In  particular,  there  is  a  path 

from  e  to  e,  and  a  path  from  et  to  e'  in  G"^  Joining  all 
these  paths  together,  we  have  a  path  from  e  to  e’  in  G” . 
Hence,  e  and  e'  belong  to  the  same  connected  component  in 
G".  ■ 
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Algor ithm: Biconnect 

1.  Find  an  inverted  spanning  tree  T(V,E ')  of  G(V,E ); 

2.  Compute  HLCA(v)  VveV; 

3.  Construct  an  undirected  graph  G"(E’,E")  such  that 

(o \ , Q  2^  tE"  iff  0 ( 2 • 

4.  Find  the  connected  components  { B} }  of  G"  .  (*  Note:  Every 
connected  components  of  G”  uniquely  determines  the 
vertex  set  of  a  biconnected  component  in  G  and  vice 
versa.  *)  ■ 


Theorem  2.28:  Algorithm  Biconnect  runs  in  0(n/K+lq2 n)  time 
with  r?K(K>1 )  processors  on  a  PRAM. 

Proof:  With  nK(K>1)  processors  available,  Step  1  takes 
0(r?/K+lg2n)  time  (Theorem  2.5).  Step  2  takes 
0(n/K+lgn*  lglgr?)  time  (Theorem  2.17).  Step  3  can  be  carried 
out  as  follows:  Construct  an  adjacency  matrix *M"  for  G" :  for 
every  eeE'  ,  M"[e,eT]  and  M"[e’,e]  are  set  to  1  iff  (i)  e'  is 
on  the  path  [ a*-»HLCA ( a )  ]  or  (ii)  (arb )  is  in  G~T  and  neither 
aSb  nor  bSa  in  T,  where  e=<a,F(a)>  and  e'=<brF{b)>.  Due  to 
| E' \=0(n)  and  the  availablity  of  F+,  testing  the  above 
conditions  takes  0(n/ K)  time  with  /?K(K>1)  processors (Lemma 
2.1).  Step  4  takes  0(n/K+lg2n)  t  ime  [CH.IN8  1  ,  CHIN82  ]  .  Hence, 
Algorithm  Biconnect  takes  0(n/K+lg2n)  time  with  r?K(K>1) 
processors.  ■ 

For  completeness,  we  would  like  to  point  out  that  the 
algorithm  for  finding  all  biconnected  components  can  be  used 
to  determine  the  set  of  all  bridges  as  well.  This  is  based 
on  the  fact  that  an  edge  e  of  G  is  a  bridge  iff  e  is  a 
biconnected  component  of  G. 
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2.10.3  Finding  all  the  Separation  Vertices  in  an  Undirected 
Graph 

Let  T{VrE')  be  an  inverted  spanning  tree  of  G(V ,E)  and 
B j  is  a  biconnected  component  of  G.  Then  BjAT  must  be 
connected  and  is  thus  a  tree.  Let  aeV .  If  a  is  not  the  root 
r  of  T,  then  a  is  a  separation  vertex  of  G  iff  a  is  the  root 
of  BjAT  for  some  biconnected  component  B j  of  G.  Moreover,  r 
is  a  separation  vertex  iff  r  is  the  root  of  B  ■,  AT  and  BjAT, 
where  Bif  Bs  are  two  distinct  biconnected  components  of  G. 
These  ideas  are  embodied  in  the  following  lemma. 

Lemma  2.29:  Let  T(V,E')  be  an  inverted  spanning  tree  of 
G(VrE )}  r  be  the  root  of  T  and  {Bkl k=i  be  the  set  of 
biconnected  components  of  G. 

a  is  a  separation  vertex  of  G 
iff  a  is  the  root  of  BjAT  for  some  j ,  if  a^ri 
or  a  is  the  root  of  B-{AT  and  BjAT  for  some  i*j ,  if  a=r. 
Proof:  Only  if  part:  Let  a  be  a  separation  vertex  of  G. 

There  exist  biconnected  components  Bj,  B } ,  itj  such  that  a 
belongs  to  both  B-,  and  B  j  ( [AH074  ],  Lemma  5.4, p.  181). 

If  a#r,  we  may  assume  without  loss  of  generality  that 
<a,F(a)>  belongs  to  B,AT.  Let  r }  be  the  root  of  BjAT.  There 
exists  a  path  Pt  in  Bj  from  a  to  r } .  There  also  exist  a  path 
P2  in  T  from  r*j  to  LCA(a,rj)  and  a  path  P3  in  T  from  a  to 
LCA(a,rj).  Clearly,  P2  and  P3  contain  no  edges  in  Bj.  But 
then  P1r  P2  and  P3  give  rise  to  a  simple  cycle  in  G  which 
will  contradict  the  fact  that  B,r  Bj  are  biconnected 
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components  unless  a=r j.  Thus  a  is  the  root  of  BjAT. 

If  a=r ,  then  there  exists  a  path  P  from  r  ■,  to  a  in  7. 
Since  B\AT  is  connected,  all  the  edges  on  P  must  belong  to 
B-,AT.  But  then  r  ■,  cannot  be  the  root  of  BtAT  unless  a=r  . 

The  same  argument  implies  that  a=r j. 

If  part:  Let  a-r  and  a  is  the  root  of  B,AT  and  BjAT  where 
/#j.  Let  Si  and  Sj  be  a  son  of  a  in  B-,AT  and  BjA7 
respectively.  Suppose  after  removing  a  from  G,  the  resulting 
graph  remains  connected.  Then  there  must  be  a  path  from  r  ■, 
to  r j  in  G  not  passing  through  a.  However,  this  path  and  the 
edges  (a,r j),  (a,r j)  will  form  a  cycle  in  G  which  implies 
that  Gj  and  Gj  cannot  be  biconnected  components.  Therefore 
the  removal  of  a  from  G  must  disconnect  G  which  means  that  a 
is  a  separation  vertex. 

Let  a*r  and  a  is  the  root  of  some  B}AT.  consider  F (a) 
and  Sj  where  Sj  is  a  son  of  a  in  BjA7.  F(a)  does  not  belong 
to  B j  for  otherwise  a  cannot  be  the  root  of  BjAT.  By 
applying  an  argument  similar  to  the  one  above,  we  can  show 
that  removing  a  from  G  would  result  in  disconnecting  F(a) 
and  Sj .  Hence  a  is  a  separation  vertex  of  G." 

As  a  consequence  of  Lemma  2.29,  the  algorithm  for 
finding  the  biconnected  components  can  be  used  to  determine 
the  set  of  all  separation  vertices  of  G  as  follows. 

Theorem  2.30:  The  set  of  separation  vertices  can  be  found  in 
0(n/K+lg2n)  time  with  nK(K>1)  processors  on  a  PRAM. 
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Proof:  First,  the  set  of  all  biconnected  components  is 
determined.  This  takes  0(n/K+lg2n)  time  with  nK(K>1) 
processors (Theorem  2.28).  Next,  the  head  of  each  eeE'  , 
head(e),  is  determined.  This  obviously  takes  0(1)  time  with 
r?K  processors.  Then  the  set  of  all  head(e)?s  are  divided 
into  groups  such  that  those  e's  belonging  to  the  same 
biconnected  component  have  their  head(e)'s  grouping 
together.  This  involves  sorting  and  takes  0(lgn*lglgn)  time 
with  n  processors  or  G(lgn)  time  with  nlgn 
processors [ B0R082 ] .  Finally,  the  head(e)  with  the  smallest 
depth  in  each  group  is  selected,  these  head(e)'s  form  the 
set  of  separation  vertices,  r  is  included  in  the  set  iff  r 
is  selected  from  two  or  more  groups.  This  step  takes 
0(n/K+lgK)  time  with  nK  processors (Lemma  2.2).  ■ 

Finally,  to  determine  the  biconnectivity  of  a 
connected,  undirected  graph  G.  We  can  check  the  numbers  of 
separation  vertices  it  has.  Clearly,  G  is  biconnected  iff 
there  is  no  separation  vertices.  This  takes  0(n/K+lg2n)  time 
with  nK ( K>  1 )  processors. 

2. -1_1  Conclusions 

In  the  preceding  sections,  we  assume  in  most  cases  that 
nK,  the  number  of  processors  available,  satisfies  the 
condition  K>1.  This  means  that  the  number  of  processors 
available  is  not  less  than  n.  In  fact,  this  assumption  is 
made  for  convenience  only  because  most  of  the  previous  work 
assumed  unbounded  parallelism.  To  make  use  of  some  of  those 
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results  in  the  course  of  developing  our  algorithms,  we  found 
it  most  convenience  t«r  assume  K>  1  .  Nevertheless,  it  is  not 
difficult  to  extend  our  results  to  cases  where  0<K<1  if 
Brent's  theorem  is  used. 

Theorem  2 . 3 1 : [ BREN74 ]  If  a  synchronized  computation  C 
consisting  of  a  total  of  q  operations  can  be  performed  in  t 
parallel  time  units  with  sufficiently  many  processors,  then 
C  can  be  performed  in  j-(q-t) /p-\+t  time  units  with  p{>  0) 
processors . 

Using  the  above  theorem,  it  is  easily  shown  that  Lemma 
2.2  can  be  generalized  to:  "Given  an  array  of  n2  elements, 

(a  ij]  1  <i,j<n,  and  r?K(K>0)  processors,  A  (  / )  -a  \  i  *a  i  2  *  •  •  •  *3  j  „  r 
1  <i<n  can  be  computed  in  j-(n2 -/7-lgn)/nK-,  +lgn=0(n/K+lgn)  time 
units."  Similarly,  Preparata's  sorting  algorithm  can  be 
executed  in  j-nlg2n/nK-|  +lgr?=0( lg2n/K+lgn)  time  units  if 
nK(K>0)  processors  are  available. 

Extension  of  all  of  our  results  from  nK(K>1)  to  r?K(K>0) 
can  be  accomplished  in  a  similar  way,. 

The  parallel  algorithms  presented  in  this  chapter  are 
optimal  for  dense  graphs  except  for  the  problem  of  finding 
the  lowest  common  ancestors  of  vertex  pairs  in  a  directed 
tree,  and  the  problem  of  finding  all  fundamental  cycles  in 
an  undirected  graph.  If  an  optimal  algorithm  for  finding  the 
lowest  common  ancestors  running  in  0((n+q)/nK)  time  with 
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nK(K>0)  processors  is  found,  then  the  performance  of  the 
algorithm  for  finding  the  fundamental  cycles  is  also 
improved  without  any  modification.  Moreover,  this 
achievement  will  provide  us  with  an  alternate  way  to  compute 
HLCA iv) , VveV ,  which  is  crucial  in  the  design  of  optimal 
parallel  algorithms  for  the  last  four  problems. 

We  feel  that  several  techniques  we  use  in  this  chapter 
deserve  further  attention  as  they  may  be  useful  in 
developing  efficient  algorithm  for  other  graph  theoretic 
problems  or  even  for  problems  in  other  disciplines. 

The  first  is  the  one  used  in  handling  graph  theoretic 
problems  which  are  strongly  related  to  cycles.  If  we  were  to 
handle  all  the  cycles  directly,  we  could  hardly  expect  the 
resulting  algorithm  to  be  polynomial  with  respect  to  the 
time-processor  product  because  the  number  of  cycles  in  a 
graph  can  be  exponentially  large.  The  technique  we  use  is  to 
restrict  our  domain  of  consideration  from  the  set  of  cycles 
to  the  set  of  fundamental  cycles  (note  that  there  are  at 
most  0(n2)  of  them).  We  also  reduce  the  number  of  egdes  to 
be  considered  from  \E\  to  n~ 1  by  constructing  an  inverted 
spanning  tree  7  for  the  given  graph  G  and  considering  only 
the  edges  in  7.  This  elaboration  allows  us  to  start  with  a 
managable  number  of  items  which  require  no  more  than  0(n4) 
operations.  Then  by  computing  the  function  HLCA(lv)'s,  much 
of  the  information  conveyed  by  the  fundamental  cycles  can  be 
stored  under  the  HLCA(u)'s.  Consequently,  the  possible 
number  of  operations  is  further  reduced  to  0(n2)  which  makes 
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the  0(n2)  time-processor  product  possible.  We  believe  that 
this  technique  may  prove  to  be  useful  in  other  graph 
theoretic  problems  which  are  cylce-or iented. 

The  second  is  described  in  Lemma  2.2  which  simply  says 
that  to  compute  an  associative  operation  involving  n  items, 
if  the  well-known  recursive-doubling  technique  is  used,  we 
need  n/2  processors  to  achieve  the  O(lqn)  time  bound. 
However,  if  we  have  only  n/lgn  processors  available,  then  we 
can  still  achieve  the  O(lqn)  time  bound  with  only  a  slightly 
larger  constant  factor.  This  technique  is  very  useful  as  it 
allows  us  to  reduce  the  number  of  processors  used  without 
affecting  the  order  of  magnitude  of  time.  The  technique  was 
known  previously  but  was  not  properly  utilized. 

The  third  one  makes  use  of  the  observation  that  if  a 
computation  requires  a  number  of  iterations  and  after  each 
iteration,  the  problem  size  is  reduced  by  at  least  half, 
then  the  total  amount  of  time  required  (in  terms  of  order  of 
magnitude)  to  complete  the  computation  is  the  same  as  that 
required  by  the  first  iteration.  Specifically,  L*  =  0T/2' <2T 
for  any  k>0J. 
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Chapter  3 

IMPLEMENTATION  ON  THE  MMM  MODEL 

3 .  1  Introduction 

In  this  chapter,  we  propose  a  general  computer  model, 
called  MMM,  which  includes  all  the  parallel  computer  models 
on  which  an  ordinary  matrix  multiplication  algorithm  exists. 
Since  almost  every  existing  computer  model  has  an  algorithm 
for  the  matrix  multiplication  problem,  the  model  proposed 
has  a  great  degree  of  generality.  In  fact,  it  includes  all 
of  the  well-known  existing  parallel  computer  models  listed 
below : 

MCN(Mesh  connected  Networks ) [CANN69 ,DEKE8 1 ,ATAL82 ] , 
PSN(Perfect  Shuffle  Networ ks ) [ STON7 1 3 , 

CCC ( Cube -connected  Cycles ) [PREP8 1 ] , 

OTN (Orthogonal  Tree  Ne twor ks ) [NATH8 1 ] , 

OTC (Orthogonal  Tree  Cycles ) [NATH8 1 ] , 

SIMD-CCC ( SIMD  Cube-connected  Computers ) [DEKE8 1 ] , 
PRAM(SIMD  Shared  Memory  Model  allowing  read 
conflicts) [WYLL79] 

WRAM(SIMD  Shared  Memory  Model  allowing  read  and  write 
conflicts) [SHIL8 1 ] . 

Let  Oitin ))  and  H(n)  denote  the  time  and  hardware 
resources (in  terms  of  number  of  processors  and  chip  area) 
required  by  the  nxn  ordinary  matrix  multiplication 
algorithm.  We  shall  show  that  our  algorithms  take  at  worst  a 
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factor  of  max  ( lgd ,  lgcf"  )  + 1  ,  1  <drd"<n,  more  time  and  the  same 
amount  of  hardware  resources  as  those  required  by  the  matrix 
multiplication  algorithm  on  the  MMM .  Since  on  many  of  the 
well-known  existing  models,  the  matrix  multiplication 
algorithm  takes  at  most  O(lgn)  time  and  H(n )  hardware 
resources,  our  algorithms  are  therefore  bounded  above 8  by 
0(  lgn*  (max  ( lgcf,  lgcf"  )  +  1 ) )  in  time  and  H(n)  in  hardware 
resources  on  those  models.  This  result  turns  out  to  be  very 
efficient  as  it  outperforms  the  previously  known  best 
algorithms  on  many  models. 

3.2  The  Computer  Model  MMM 


3.2.1  Definitions 

definition:  Let  (S,+,*,0,1)  be  a  ring  and  Mn  be  the  set 
of  nxn  matrices  over  S.  An  ordinary  matrix  multiplication 
algorithm  for  Mn  is  an  algorithm  which  takes  advantage  of 
only  the  associative  property  of  +  in  multiplying  any  two 
matrices  in  M„ . 

Note  that  the  well-known  Strassen  algor ithm[ STRA69 ]  is 
not  an  ordinary  matrix  multiplication  algorithm  because  it 
makes  use  of  the  additive  inverse  property  of  +.  There  are 
two  reasons  why  we  consider  only  ordinary  matrix 


8  The  term  ’bounded  above'  need  some  clarification.  Here  we 
mean  that  the  algorithms  will  take  0(  t  (n)  *  (max  ( lgcf,  lgcf"  )  +  1 ) ) 
time  and  Hin)  hardware  resources  if  the  algorithms  are 
indeed  implemented  in  a  way  using  matrix  multiplication. 

However,  as  it  will  be  clear  in  the  following  section  that 
our  algorithms  do  not  rely  on  matrix  multiplication,  other 
more  efficient  techniques  could  be  used  if  they  were 
available . 
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multiplication  algorithms  here.  The  first  is  because  for 
most  of  the  existing  computer  models,  the  only  known 
algorithm  for  multiplying  matrices  is  an  ordinary  matrix 
multiplication  algorithm.  The  second  reason  is  that  in  the 
rest  of  this  chapter,  we  will  frequently  encounter  matrices 
whose  elements  are  chosen  from  closed  semi r ings [ AH074 ] ,  and 
closed  semirings  do  not  possess  the  additive  inverse 
property.  As  a  result,  matrix  multiplication  algorithms  for 
matrices  over  a  ring  which  make  use  of  the  additive  inverse 
property  cannot  be  applied  to  these  matrices. 

The  MMM( Matrix  Multiplication  Model)  has  the  following 
features : 

(i)  there  exists  an  ordinary  matrix  multiplication 

algorithm; 

(ii)  each  processor  contains  a  constant  number  of  registers 
and  is  capable  of  carrying  out  any  of  the  operations  +, 

,  *9,  \/,  /\,  ,  =,  #,  <,  >  in  constant  time; 

(iii)  communication  between  interconnected  processors  and 
between  registers  within  the  same  processor  takes 
constant  time. 

In  representing  the  given  undirected  graph,  an 
adjacency  matrix  M  is  used.  The  entry  M[/,j]  of  M  is  stored 
in  the  M  register  of  processor  PE[ /,_/],  1 </,j<n.  In  general, 
register,  say  A,  in  PE[i,j]  is  denoted  by  A[i,j],  Again, 
without  loss  of  generality,  we  assume  G  is  connected  and  the 

9 As  a  matter  of  fact,  multiplication  is  not  used  in  our 
algorithms,  the  *Ts  appearing  in  the  algorithms  are  just  a 
shorthand  of  the  i f ...  then  ...  else  statement. 
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vertex  set  V= { 1 , 2 , 3 , . . . , n] ,  throughout  this  chapter.  We  use 
d  to  denote  the  diameter  of  G,  I  to  denote  the  summation  of 
integers  and  an  APL  type  of  syntax  to  describe  our 
algorithms.  As  a  result,  0  will  represent  both  the  integer 
zero  and  the  boolean  constant  ’false’  and  1  will  represent 
both  the  integer  one  and  the  boolean  constant  'true'.  As  an 
example,  c*ia=b)  is  equivalent  to  if  a=b  then  c  else  0. 

Definition:  A  function  f  is  called  an  extended  monadic 
function  w.r.t.  i,  j  if  the  arguments  of  f  are  of  the  form 
OP [  / , _/]  where  OP  is  either  the  name  of  a  register  or  a 
function  of  7,  J.  We  denote  it  by  f[i,j]. 

The  following  lemma  dominates  the  rest  of  this  chapter. 

Lemma  3.-1:  The  following  operation  could  be  carried  out  on 
the  MMM  using  the  same  order  of  magnitude  of  time  and 
hardware  resources  as  the  ordinary  matrix  multiplication 
algorithm. 

M[  i,j]  :=f  3  (HiU  •,  (IK/7,  [  ifk]  ,f2[k,j] )) )  V  ij ,  1  </,j<n, 
where  M[/,j]  is  a  register  of  processor  PE[ /,_/].  f1f  f2r  f 3 
are  extended  monadic  functions  w.r.t.  7,k;  k,j  and  i,j 
respect ively .  n  is  a  composite  function  of  the  arithmetic 
and  boolean  operations  mentioned  in  the  definition  of  AfMM 
and  S  is  an  associative  operator. 

Proof:  Trivial.  ■ 
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3.2.2  Some  Perliminary  Results 

In  the  following  sections,  in  proving  the  resource 
complexities  of  each  step  of  the  algorithms,  we  shall  use 
the  following  strategy:  we  show  that  the  step  can  be  carried 
out  by  a  method  similar  to  that  used  by  the  matrix 
multiplication  algorithm.  The  advantage  of  this  strategy 
should  be  obvious  as  it  allows  us  to  carry  out  our  analysis 
without  having  to  deal  with  the  detailed  structure  of  the 
model  (e.g.  how  the  processors  are  connected  together).  A 
typical  example  is  data  routing  which  is  always  needed  in 
parallel  algorithms  and  whose  implementation  and  efficiency 
are  greatly  model-dependent.  Using  the  above  strategy,  data 
routing  can  be  handled  in  a  model- independent  way. 

Broadcasting  the  contents  of  a  register  columnwise  or 
rowwise  is  needed  frequently  in  subsequent  discussions.  We 
give  a  bound  on  its  resource  complexities  in  the  following 
lemma  using  the  above-mentioned  strategy. 

Lemma  3.2:  Let  PE[i,j ]  1 <i,j<n,  be  a  set  of  processors.  The 
time  and  hardware  resources  needed  to  broadcast  the  contents 
of  register  A 1[a,b]  columnwise  (rowwise)  is  at  worst  the  same 
as  that  needed  by  the  ordinary  matrix  multiplication 
algorithm  on  the  MAfM. 

Proof:  To  broadcast  the  contents  of  Af[a,£>J  columnwise,  we 
perform 

M[iA/,b]:=  Lk{{dummy[iA/tk]*0)  +  (M[krb]*(k=a)))  '\<iA/<m 


or  simply. 


>  • 


-  *. 

/-< 


1 

■ 


>c. 

- 

A 


•  V 


-  ' 

' 


*  .  .n 1  ■;* 


,  :■  *  ,.V 


) 

■  ■  > 


'  v  ■>  • 


■  ■ 


68 


M[w,b]i=  Ik  ( i  dummy  [  ia/  ,  k]  *0 )  +M[k ,b] )  ,  1  <n/<n,  if  M[a,b]  is 
the  only  possible  non-zero  term  in  the  column. 

Here  dummy  can  be  any  register  as  its  appearance  is  just  to 
ensure  that  the  resulting  expression  conforms  to  the  one 
stated  in  Lemma  3.1,  it  is  irrelevant  to  the  computation. 
Clearly,  M[w,£>]  =M[ar  b]  ,  1  <w<n.  By  Lemma  3.1,  the  lemma 

follows . 

Broadcasting  rowwise  can  be  handled  in  the  similar  way.  ■ 

We  have  to  emphasize  that  we  do  not  mean  that 
broadcasting  the  contents  of  a  register  columnwise  or 
rowwise  has  to  be  actually  done  in  the  above  way.  We  merely 
want  to  show  that  its  complexity  is  bounded  above  by  that  of 
matrix  multiplication. 

The  following  are  some  basic  results  which  will  be 
referred  to  frequently  in  the  rest  of  this  chapter. 

Lemma  3.3:  The  following  operations  can  be  carried  out  in 
0(t(n) )  time  with  H(n)  hardware  resources  on  the  MMM . 

(i)  fc[u,v] :=EE = yf (u,k) ,  ^<u,v<n} 

(ii)  <R[u,j]  :=ld  =  if  (u,k)  ,  l<j<n7  1  <t/<n. 

where  <R  is  any  register,  f(u,k)  is  an  extended  monadic 
function  w.r.t.  u,  k  and  S  is  an  associative  operator. 

Proof : 

Ek=1f(u,/0  is  equivalent  to  Ek (f {u , k) + idummyik rv]*0) ) . 
Similarly , 

Id=i f(u,k)  is  equivalent  to  I k (f (u , k) * ik^j ) ) . 

From  Lemma  3.1,  the  lemma  follows.  ■ 
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Lemma  3.4:  Let  O(t'in))  and  H' in)  be  the  time  and  hardware 
resources  required  by  the  all-pair  shortest  path  algorithm 
on  the  MMM  and  G  be  an  unweighted  (directed  or  undirected) 
graph  with  diameter  cf;  then 

t' (n)=t(n)*(lgaf+1 )  and  H'  (n)=H(n) 

Proof:  Let  M  be  an  adjacency  matrix  of  G. 

Construct  matrix  D  such  that 

r  1  if  M[i,j 3=1  and  i*jl 


D[  i  ,j] 


0  if  i=j; 


L  +od  if  M[  / ,  j]  =  0  . 


Compute  the  matrix  Dd  as  follows: 

D'  =  D 

D2**  1 iu,v]= mink (D2 *  * (  ' ‘ 1 5 [u,k]  +  D2 *  * (  1  - 1 5 [krv] ) ,  />1 . 

A  simple  induction  will  reveal  that  D2**'[u,v ]  contains  the 
length  of  the  shortest  path  from  u  to  v  consisting  of  no 
more  than  2'  edges.  Therefore  after  lgcf  iterations,  Dd[u,v] 
will  contain  the  shortest  distance  from  u  to  v  in  G.  One 
more  iteration  is  required  to  verify  that  Dd  has  been 
computed.  By  Lemma  3.1,  t '  ( n)  =t  ( n)  *  (lqd+ *\ )  and  H'{n)=H(n).  ■ 

Lemma  3.5:  Computing  the  transitive  closure  of  an  adjacency 
matrix  can  be  done  in  t"(n)  time  with  H"(n)  hardware 
resources  where  tJ'(n)^t'(n)  and  H"(n)^H'(n). 

Proof:  Let  the  transitive  closure  matrix  be  M*j  then 
M*[a,b]  =  1  iff  Dd[a, £>]*+».  ■ 
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3.3  Constructing  a  Breadth-first  Search  (Directed)  Spanning 
Forest  of  an  Undirected  Graph 

In  this  section,  we  shall  present  an  efficient 
algorithm  for  constructing  a  directed  breadth-first 
search(BFS)  spanning  forest  for  an  undirected  graph  on  the 
MMM .  The  method  used  was  first  'implicitly'  given  by  Savage 
for  the  PRAM[ SAVA77 ] .  It  also  appeared  in  [ DEKE8 1 ]  and 
[ATAL82] . 

Theorem  3.6:  Given  an  nxn  adjacency  matrix  Af  of  an 
undirected  graph  G(V ,£) ,  a  directed  BFS  spanning  forest  for 
G  can  be  determined  in  0( t in) • ( lgd+ 1 ) )  time  with  Hin) 
hardware  resources  on  the  MMM . 

Proof:  We  shall  construct  an  adjacency  matrix  T  for  the  BFS 
spanning  forest  (since  an  inverted  spanning  tree  is  more 
convenient  in  some  cases,  the  transpose  of  T,  T'  is  also 
constructed) . 

First,  compute  the  all-pair  shortest  path  matrix  Dd  for 
G  and  the  transitive  closure  matrix  M*  using  Lemmas  3.5  and 
3.6.  Then  for  each  connected  component  of  G,  choose  the 
smallest-numbered  vertex  in  it  as  the  root  of  its  spanning 
tree.  Since  every  smallest-numbered  vertex  of  a  connected 
component  satisfies  the  following  property,  namely,  v  is  the 
smallest-numbered  vertex  in  a  connected  component  iff 
M*[v,k]=0j  Vk<v ,  the  set  of  all  these  vertices  can  be 
determined  easily  as  follows:  compute  the  partial  sums 
/?ank[u,  j]  :=Zd  =  [l/,/<]  ,  '\<j<n,  (Lemma  3.3(i))j  then  every 
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processor  computes  locally  Rep[ur j] := {Rank[u, j]= 1 )/\(u=j) , 
]<Urj<n.  After  this  step,  it  should  be  clear  that  Rep[r , r]=l 
iff  r  is  the  smallest-numbered  vertex  of  a  connected 
component  iff  r  is  the  root  of  a  spanning  tree  (note  that 
Re p  is  a  boolean  array). 

After  all  the  roots  r  of  the  BFS  spanning  forest  are 
determined,  the  level  of  every  vertex  in  the  forest  is  also 
determined.  This  is  because  1 evel (v) =Dd [r ,v]+ 1,  VveV ,  where 
r  is  the  root  of  the  tree  containing  v.  We  shall  store 
level (v)  into  level [v,v].  This  is  accomplished  as  follows: 
level [r ,v]  :=  Dd[rrv]+ 1j 

(Broadcast  columnwise)  level [K, v] : =  level [r,v]  Vv,keV ; 
At  this  point,  level [v ,v]=level (v) ,VveV. 

Next,  select  a  father  for  each  vertex  v/  which  is  not  a 
root.  This  is  accomplished  in  two  steps.  In  the  first  step, 
all  the  vertices  whose  levels  are  one  less  than  that  of  v 
are  identified: 

(Broadcast  rowwise)  level [v,k]  :=  level [v,v]  Vv,keV j 
F'  [v,j] :  =\/k  ( level  [v,k]  =  ((  1  +  level  [/<,  j] ) * (k=j) ) ) . 

The  second  statement  needs  some  explanation:  after 
broadcasting  level [v , v] ,  VveV  rowwise,  1 evel [v rw]=l evel (v) , 
Vv,weV.  As  a  result,  the  right-hand  side  of  the  statement  is 
equivalent  to  \/k ( level ( v) =  if (k=j) then( \+leve1 ik) )elseO) 
which  in  turns  is  equivalent  to  if  ( level (v) = ( 1 + level ( j ) ) ) 
then  1  else  0.  Hence,  F'[v,j]= 1  iff  level (v) = level ( j) + 1 . 

In  the  second  step,  the  largest-numbered  vertex  which  is  one 
level  higher  than  v  and  is  adjacent  to  v  in  G  is  selected  as 
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the  father  of  v  in  the  BFS  spanning  forest: 

F[v, v] : =maxk ( i f  (F' [v rk]/\M[v ,k] )  then  k  else  0; 

(Lemma  3 . 3 (  i  ) ) 

Note  that  F[v  ,v] ,  for  v*r ,  contains  the  father  F{v)  of  v  in 
the  BFS  spanning  forest. 

Finally,  construct  an  adjacency  matrix  T  and  its  transpose 
T'  to  represent  the  BFS  spanning  forest.  This  is 
accomplished  by  the  following  computations: 

(Broadcast  columnwise:)  F' [k,v]  :=  F[v,v ]  Vv,keVi 
(Broadcast  rowwise:)  F[v,k ]  :=  F[v,v]  Vv,keV ; 

T[w,v]  :=  (w=F'[w,v])  Vv,weV} 

T'[vfw 3  :=  (w=F[v,w])  Vv,weV. 

Thus,  T  and  7?  are  boolean  matrices  such  that  T[u,v]  =  '\ 
(resp.  T'[u,v]  =  1)  iff  u  is  the  father  (resp.  a  son)  of  \/. 

From  Lemmas  3.1,  3.2,  3.4  and  3.5,  we  have:  finding  a 
directed  BFS  spanning  forest  of  an  undirected  graph  takes 
0(  t  (n)  •  ( lgcf+ 1 ) )  time  with  H(n)  hardware  resources.  ■ 

3.4  Finding  the  Lowest  Common  Ancestors  of  all  Vertex  Pairs 
in  a  Directed  Tree 

In  this  section,  we  implement  the  algorithm  for  finding 
the  lowest  common  ancestors  presented  in  Chapter  2  on  the 

MMM. 

Theorem  3.7:  Given  an  adjacency  matrix  T  of  a  directed  tree 
with  diameter  d,  computing  the  lowest  common  ancestors  of 
all  vertex  pairs  of  the  directed  tree  can  be  done  in 
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0( t in) • ( lgd+ 1 ) )  time  with  H{n)  hardware  resources  on  the 

MMM. 

Proof:  First,  construct  the  transpose  7?  of  7  as  follows: 
every  processor  executes  the  statement  F[u,v]:=  if  T[u,v] 
then  u  else  0;  locally.  Since  there  is  one  and  only  one 
nonzero  F[urv]  value  in  each  column,  we  may  use  Lemma  3.2  to 
broadcast  these  nonzero  F[urv]'s  columnwise.  After  this 
step,  F[v ,v]  contains  the  father  of  v  in  the  directed  tree. 
Then  perform: 

(Broadcast  rowwise:)  F[v,k] : =F[v , v] ,  Vv,keV ; 

7'  [v,u] :=u=F[v,u] ,  Vu,veV. 

From  Lemma  3.2,  this  step  takes  Oit(n))  time  with  H(n) 
hardware  resources. 

Next,  compute  the  transitive  closure  7*  and  (7T)*  of  7  and 
7'  respectively.  By  Lemma  3.6,  this  step  takes 
0( t (n) • ( lgd+ 1 ) )  time  with  Hin)  hardware  resources.  Note  that 
in  the  course  of  computing  the  transitive  closures,  the 
level  of  each  vertex  is  also  determined  (recall  that 
level (v) = 1 +Dd [r,v]  VveV ,  where  r  is  the  root  of  7)  and 
level iv)  is  stored  in  level [v rv] . 

Finally,  compute  the  matrix  LCA : 

(Broadcast  rowwise:)  level [v,w] : = level [v,v]} 

LCA [  / , J ]  :=  (max$) k{ (7T )*[/ ,k]* (7* [k,j ]*k)}* 

The  above  expression  in  the  braces  should  be  interpreted  as: 
if  k  is  an  ancestor  of  / 
then  if  k  is  an  ancestor  of  j  then  k 
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else  0; 

The  evaluation  of  the  binary  operation  (max< ) {arb}  needs 
some  explanation.  We  proceed  in  two  time  units.  In  the  first 
time  unit,  a  and  b  are  transferred  simultaneously  from 
processors  PE[i,k]  and  PE[k,j ]  respectively  to  a  processor 
PE[i,j,k]  at  which  the  binary  operation  is  to  be  carried 
out.  In  the  second  time  unit,  level (a)  and  level (b)  are 
transferred  simultaneously  from  processors  PE[i,k]  and 
PE[k,j ]  to  PE[i,j,k].  The  values  of  level (a)  and  level (b) 
are  then  compared  in  that  processor  and  if  level (a)  is 
greater,  then  a  is  the  value  of  (max$ ) {a,b} ,  otherwise  b  is 
the  value. 

Computing  the  matrix  LCA  takes  Oit(n))  time  with  H{n) 
hardware  resources." 

3.5  Finding  a  set  of  Fundamental  Cycles  of  an  Undirected 
Graph 

As  with  Section  5  of  Chapter  2,  we  shall  construct  the 
matrices  F+,  LCA  and  P+  to  represent  the  fundamental  cycles 
on  the  AfMAf.  Since  LCA  has  been  discussed  in  the  last  section 
and  P+  can  be  easily  determined  from  level: 

(P+ ( v) = )P+ [v,v] : =n~ level [v , v] ,  we  shall  discuss  only  the 
construction  of  F+, 

Assuming  that  an  adjacency  matrix  M  of  G(VrE)  is  given. 
We  construct  the  matrices  7  and  7'  for  a  BFS  spanning  tree 
of  G.  Clearly,  the  diameter  of  the  BFS  spanning  tree  is  not 
greater  than  that  of  G.  Then  using  Theorem  3.4,  we  compute 
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the  all-pair  shortest  path  matrix  (7')d  for  7'.  Since  for 
any  vertex  u  in  an  inverted  tree,  all  vertices  reachable 
from  u  are  located  on  the  path  from  u  to  the  root, 
therefore,  the  uth  row  of  (7')d  contains  exactly  one  /  for 
each  /  in  the  range  [ 0 , level (u) ) ,  and  contains  no  j  in  the 
range  [ level iu) , n ] .  Consequently,  Fk (u)  can  be  computed  as 
follows : 

F+ [u,k] :=Z j ( (7' ) d [u, j]-(j ,k) ) , 

where  =*  is  defined  as  a-\b,C )  =  if  a=C  then  b  else  0. 
Computing  ^  can  be  done  in  a  manner  similar  to  that  used  for 
computing  (ma x£){a,b}.  Specifically,  we  proceed  in  two  time 
units.  In  the  first  time  unit,  a  and  b  are  transferred 
simultaneously  from  processors  PE[u,j]  and  PE[jrk ] 
respectively  to  a  processor  PE[u,k,j].  In  the  second  time 
unit,  c  is  transferred  from  PEij ,k]  to  PE[urk,j].  a  and  c 
are  then  compared  in  that  processor.  If  they  are  equal,  b  is 
the  value  of  the  computation,  otherwise  the  result  is  0. 

Finally,  adjusting  the  array  F+  is  straightforward  and 
takes  no  more  than  0(t(n) • (lgd+1 ) )  time  with  H(n)  hardware 
resources . ■ 

3.6  2-coloring  an  Undirected  Graph 

We  shall  implement  Algorithm  Bipartite  on  the  MMM  in 
this  section. 

Theorem  3.8:  Given  an  adjacency  matrix  Af  of  an  undirected 
graph  G,  the  2-colorabili ty  problem  can  be  solved  in 
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0( t in) • ( lgd+ 1 ) )  time  with  H{n)  hardware  resources. 

Proof:  We  generate  the  matrices  T  and  T'  for  a  BFS  spanning 
tree  T  of  G  using  Theorem  3.6  and  the  adjacency  matrix  M'  of 
G-T  by  computing  AT  [ / , j] : =M[ / , j ]/V (7[ / , j ]\/T' [ / , j] )  locally 
at  every  processor.  Note  that  AT  is  a  boolean  matrix  such 
that  AT  [i,_/]:  =  1  iff  (/,j)  is  an  edge  in  G  but  not  in  T.  We 
then  examine  every  fundamental  cycle  in  G  by  testing  if 
level ( f ) # level (j) ,  for  every  (i,j)  in  G-T,  as  follows: 
(Broadcast  rowwise:)  level [v , w] : = level [v , v] ; 

(broadcast  columnwise:)  level ' [w, v] :  =  level [v ,y] ; 

Flag[ i ,j] i- level  C i ,j]*level ’ [ i ,j] . 

The  previous  statement  is  equivalent  to 

FI ag[ i , j] :=7 evel ( / ) ^7 evel (j) .  Here  we  employ  a  property  of 
the  BFS  spanning  trees:  if  (  i  rj )  is  an  edge  in  G-T,  then  the 
difference  between  the  levels  of  /  and  j  cannot  be  greater 
than  1 .  As  a  consequence,  the  condition 

" level [ / , j]^ level C /, j] "  is  equivalent  to  that  tested  in  Step 
2 ( i i )  of  Algorithm  Bipartite. 

Now,  we  assume  Flag[i,j ]  has  the  initial  value  1.  We  proceed 
to  compute  /\ , , }Flag[ i ,j]  as  follows: 

Bi part ite[ i jj] :=/\kFl ag[ i ,k] ,  Vi,jeVi  (Lemma  3.3(i)); 
Bipartite C /, j] :=/\kB i part ite[krj] ,  y i ,jeVi 

( Lemma  3 . 3 ( i ) ) . 

At  this  point,  G  is  2-colorable (bipartite)  iff 
Bipart ite[ 1 , 1 ]  =  1 . 

Finally,  if  Bipart ite[ 1, 1 3= 1 ,  we  compute 

Part  it ion[v,v] :=Bipart iteiv ,v]/\( level [v,v]  is  odd)  locally 
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at  every  processor.  By  Lemmas  3.1,  3.2,  3.5  and  Theorem  3.6, 
the  total  time  taken  is  Oi t in) • ( lgd+ 1 ) )  and  the  hardware 
resources  needed  are  Hin) .■ 

3.7  The  Bridge-connectivity  Problem 

In  this  section,  we  shall  implement  Algorithm  Bridges 
presented  in -Chapter  2  on  the  MMM .  We  shall  determine  the 
set  of  bridge-connected  components  at  the  same  time. 

Theorem  3.9:  Given  the  nxn  adjacency  matrix  M  of  GiV ,E) ,  the 
set  of  all  bridges  and  bridge-connected  components  in  G  can 
be  determined  in  Oi t in) • i lgd+ 1 ) )  time  with  Hin)  hardware 
resources  on  the  MMM. 

Proof:  We  proceed  in  6  steps. 

In  Step  1,  we  construct  the  matrices  7  and  7’  for  a  BFS 
spanning  tree  T  of  GiV,E).  As  a  consequence,  the  matrix 
level  is  also  available.  Recall  that  level iv) = level [v , v] , 
VveV ,  (Lemma  3.6). 

In  Step  2,  we  compute  ^LCA ii,j)  which  is  the  level  of 
LCA (/,_/)  as  follows: 

Compute  the  transitive  closures  7*  and  (T?)*  of  T  and 
7’  respectively. 

(Broadcast  rowwise)  level  [vrw]  :=  level  [v,v],  Vv,iA/eV; 
7LCA [ / , j ] :=maxk { (7' ) * [ / ,K] * (T* [k, j] *  level [ k , J ] )  } . 

The  expression  in  the  above  statement  can  be  interpreted  as 
if  ( 7 ')*[/, k ] /\7 ’ * [ k , j ] )  then  level (k)  else  0.  Note  that  in 
particular,  4LCh[v ,v]= 1 evel (v)  VveV. 
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In  Step  3,  we  construct  an  adjacency  matrix  AT  for  G~T 
as  well  as  the  matrix  ^HLCA: 

A :=  M[/,j]  A  (-(T[/,j3  \/  T '  [  /,_/])); 

(note  that  M' [v/,\/]  =  1  V/\/e V/  ) 

^HLCA [  v ,  \/]  :=  mink{if  M'[vfk]  then  ^LCA[\/,A]  else  0], 
(Lemma  3 . 3 ( i ) ) ; 

note  that  ^HLCA[\/,\/]  contains  the  level  of  HLCA(v)  and 
0<JHLCh[v ,v]^l evel(v)  VveV. 

In  Step  4,  we  compute  the  array  a: 

(Broadcast  rowwise:)  ^HLCA[\/,n/]  :=  /HLCA[v,i/]  VweV} 
a[vrw ]  :=  mink {7* [v ,k]*4HLCk[k,w] }_? 

Note  'that  a  [v ,  v]  =a  (v)  .  Moreover,  a[\/,iv]=a[\/,v/]  Vv,lA/eV. 

In  Step  5,  we  compute  the  matrix  Bridge : 

Br idgeiu ,v ] :=\/k (T[urk]/\( (a[K ,v]>7 evel [k,v] )/\(k=v) ) ); 
The  right-hand  side  of  the  above  statement  is  equivalent  to 
T[u ,v]/\(a(v)>level (v) ) ,  thus,  Br idge[u,v]= 1  iff  (u,v)  is  a 
bridge  in  G. 

Finally,  in  Step  6,  we  compute  the  bridge-connected 
components : 

first  remove  the  bridges: 

M”  [  i  ,j] :  =M[  /,  j]  /M^Br/c/get  j]-)j 

then  compute  the  transitive  closure  (MT?)*. 

From  Theorems  3.5,  3.6  and  Lemmas  3.1,  3.2,  3.3  and  the 
fact  that  the  diameter  of  Af"  cannot  be  greater  than  d,  we 
have:  the  bridge-connectivity  problem  can  be  solved  in 
0( t (n) * ( lgd+ 1 ) )  time  with  H(n)  hardware  resources  on  the 
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3.8  The  Bridge-connectivity  Augmentation  Problem 

In  this  section,  we  shall  implement  Algorithm  Brconnect 
of  Chapter  2  on  the  MMM.  We  proceed  in  a  step-wise  manner. 
First  we  show  how  G  can  be  condensed  into  G0  on  the  MMM  in 
Lemma  3.10.  Then  we  construct  the  edge  set  A,  in  Lemma  3.11. 
After  that  we  discuss  how  a  directed  tree  can  be  labelled  in 
perorder  on  the  MMM  in  Lemma  3.12  and  how  the  edge  set  A2 
can  be  constructed  in  Lemma  3.13.  Finally,  in  Theorem  3.14, 
we  combine  Lemmas  3.10-3.13  to  derive  the  resource 
complexities  of  Algorithm  Brconnect  on  the  MMM. 

Lemma  3.30:  Given  an  adjacency  matrix  M  of  GiV ,E) ,  the 
forest  Go(V0rEo)  can  be  constructed  in  Oi  t  (n)  *  ( lgcf+  1  )  )  time 
with  H(n)  hardware  resources  on  the  MMM. 

Proof:  We  shall  construct  an  adjacency  matrix  M0  to 
represent  G0 • 

First  note  that  we  can  construct  l/0  by  picking  a 
representative  from  each  bridge-connected  component  of  G. 

For  convenience,  we  pick  the  smallest-numbered  vertex  from 
each  bridge-connected  component  since  this  vertex  can  be 
determined  easily  by  using  the  method  described  in  Theorem 
3.6.  As  a  result,  we  have  the  matrix  £?-rep  such  B~rep[v ,  v]  =  1 
iff  \/  is  the  smallest-numbered  vertex  of  a  bridge-connected 
component  iff  veV0 .  Note  that  in  the  course  of  computing 
B~ rep,  we  also  compute  the  matrices  Bridges  and  (M")“  (see 
Algorithm  Bridges  Steps  5  and  6). 

To  determine  the  edge  set  E0 ,  we  first  determine,  for 
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each  UeV0,  the  set  V (u) = {k \ ( j , k)  is  a  bridge  and  u  and  j 
belong  to  the  same  bridge-connected  component}. 

Specifically,  we  compute: 

Cross-bridge[tv,k]  :=\/j  ( (AT)  *  [u ,  j  ]/\Br  idge[  j  ,k] )  . 

Note  that  for  every  ueV  0  ,  Cross-br  idge  [u,  k]  =  1  iff  keV(u). 
Next,  we  replace  each  k  in  V  (u)  with  the  vertex  1/  in  V0  such 
that  v  represents  the  bridge-connected  component  containing 
k.  This  can  be  easily  accomplished  by  computing: 

(Broadcast  rowwise:)  B~rep[v,w] : =B~rep[v, v] ; 

(Broadcast  columnwise:)  B-rep' [wrv] : =B~rep[v,v] j 
T0[u,v]:=\/k  (  (£-rep[L/,/0/\Cross-bridge[u,k]  ) 

/\  ((M'')*[kfv]/\B-rep'  [k,v])  ). 

The  above  statement  should  be  interpreted  as 

T0 [u, v] : = ( Ik) iueVo ,  and  u  crosses  a  bridge  to  reach  k,  where 

k  is  in  the  bridge-connected  component  as  \/  where  veV0 )• 

An  adjacency  matrix  T0  for  Go  is  thus  constructed. 

From  Lemmas  3.1,  3.2,  and  Theorem  3.9,  we  have  the 
indicated  time  and  hardware  resource  complexities.  ■ 

Lemma  3 .  _1  _1  Given  an  adjacency  matrix  T0  of  Go(V0fE0), 
constructing  the  edge  set  A,  takes  0(t  in)  •  (lqd+ *\ ) )  time  with 
H(n)  hardware  resources  on  the  MMM  where  d  is  the  diameter 
of  Go . 

Proof:  First  compute  To  and  /?ep(Theorem  3.6)  and  then 
proceed  in  three  steps. 

In  Step  1,  we  find  the  isolated  vertices  and  select  two 
pendants  from  each  tree  in  G0 .  These  are  the  vertices  having 
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degree  <  1.  Therefore  we  begin  by  computing  the  degree  of 
each  vertex  u: 

degreelu,u] :=!£  =  iT0 [u,k]  Vue V,  (Lemma  3.3(i)); 

(Note:  degree [ u , u ]  is  the  degree  of  u  in  G  ). 

Based  on  degree,  we  compute  Pen-iso[u, u] : = (degreeiu ,u]^ 1 ) 
locally  at  every  processor.  Then  Pen-iso[u,u]= 1  iff  U  is  a 
pendant  or  an  isolated  vertex  of  G0 .  We  assume 
Pen-iso[u ,v]=0  for  u*v . 

The  remaining  part  of  Step  V  is  devoted  to  isolating  two 
pendants  from  each  non-trivial  tree  of  G0  and  the  isolated 
vertices  in  Go •  The  isolated  vertices  are  labelled  with  -1 
while  the  two  pendants  from  the  same  tree  are  labelled  with 
1  and  2  respectively.  The  remaining  vertices  are  labelled 
with  0.  We  begin  with  computing  a  boolean  matrix  Pi  such 
that  Pi[u,v]= 1  iff  v  is  a  pendant  of  the  tree  represented  by 
U  in  G0  or  \/  is  an  isolated  vertex.  Note  that  in  the  latter 
case  u  must  equal  to  \/. 

(Broadcast  columnwise:)  Pen- iso '  [iv,  u] :  =Pen- iso  [u  ,u] ; 

(Broadcast  rowwise:)  Rep[u ,  n/1 :  =Rep[u,u]  j 

Pi[u,v] : =Rep[u ,v]/\To [u ,v]/\Pen-iso' [u,v] . 

Using  Pi,  we  rank  the  pendants  of  every  non-trivial  tree: 

PRS[u,  j]  :=£(!  =  ^Pi  [u,K] ,  1  <j<n  (Lemma  3.3(ii))j 
The  pendants  we  select  from  each  tree  are  those  whose  PRS 
values  (ranks)  equal  to  1  or  2.  We  proceed  to  label  the 
isolated  vertices  and  the  selected  pendants  as  follows. 

For  pendants,  we  compute: 


label [u,v] : =  if  ( {PRS[u,v]^2)/\Pi [u,v] ) 
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The  above  statement  should  be  interpreted  as:  if  v  is  a 
pendant  of  the  tree  represented  by  u  and  its  rank  in  that 
tree  (PRS[u  rv])  is  1  or  2,  then  label  v/  with  its  rank  else 
label  v  with  0. 

For  isolated  vertices,  we  compute: 

label  [u,u]  :=  if  (PRS[u,u]  =  '\ ) /\{degree[u ru]=0) 

then  - 1 ; 

This  completes  Step  1  . 

In  Step  2,  we  rank  the  trees  in  Go  and  pass  the  rank  of 
each  non-trivial  tree  to  its  two  selected  pendants.  Recall 
that  Repik rv]=Rep[k , k]= 1  Vk,veV  in  Step  1. 
rank [/,/]:=! iU  i Rep[k , / ] ; 

Clearly,  rank[ i , / ]  is  the  rank  of  the  tree  represented 
by  /  in  G0. 

We  then  pass  the  rank  of  each  non-trivial  tree  to  its  two 
pendant  vertices  labelled  with  1  or  2. 

(Broadcast  rowwise:)  rank[k,u] =rank[k, k]  Vk,veVy 
rank[v ,v] : =L" = t if  Pi[k,v]  then  rank[k,v ]  else  0; 

At  this  stage,  rank[u,u] =rank[v rv]  iff  u,v  are  the  two 
pendants  in  the  same  tree  in  G0 . 

In  Step  3,  we  construct  the  matrix  A i  such  that 
Ay  [u ,v]  =  *\  iff  (u,v)eA:: 

Broadcast  all  the  label [u,v]'s  with  value  -1,  1  or  2 
columnwise.  Since  there  is  at  most  one  nonzero  label  in 
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each  column,  this  broadcasting  can  be  realized  as 
follows : 

label ' iu,v]:=L Ja1  label [k,v]; 

(Broadcast  rowwise:)  label [u,w] : = label [u,u] ; 

(Broadcast  columnwise:)  rank'  [n/,u] :  =rank[u,u] ; 

(Broadcast  rowwise:)  rank[u,iA/] :  =rank[u,u] ; 

Finally,  execute  the  following  statement  locally  at  every 
processor : 

if  { (PRS' [urv] =- 1 )  (i.e.  v  is  an  isolated  vertex)  and 
the  ranks  of  u  and  \/  differ  in  only  1  and  u  is 
labelled  with  1  or  2}  or  { u  and  v  are  both 
selected  pendants  but  the  rank  of  u  is  1  greater 
than  that  of  v  while  its  label  is  1  less  than  that 
of  \/  or  the  reverse] 
then  Ai [u , v] : = 1 
else  A i [u ,v] : =0 j 

The  above  conditions  can  be  easily  tested  by  retrieving  the 
contents  of  the  label,  label' ,  rank  and  rank'  registers  in 
each  processor. 

From  Lemmas  3.1,  3.2,  3.3,  3.5  and  Theorem  3.6,  we  have 
the  indicated  time  and  hardware  resource  complexities.  ■ 


Lemma  3..12:  Given  an  adjacency  matrix  T  of  a  directed  tree 
whose  diameter  is  d ,  labelling  the  vertices  of  the  tree  with 
preorder  numbers  can  be  done  in  0(t(n)*{lgd+d\))  time  with 
H(n)  hardware  resources  on  the  MMM. 
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Proof:  We  proceed  in  3  steps. 

In  step  1,  we  compute  nd(v) ,  the  number  of  descendants 
of  v,  for  every  v/.  This  is  easily  accomplished  by  first 
computing  7“  and  then  adding  all  the  I's  in  each  row  of  7*. 
Recall  that  T*[u,k]  =  '\  iff  k  is  a  descendant  of  u. 
nd[u, v]  :=lk  =  :T[urk]  * . 

In  step  2,  we  compute  ndsiv) ,  the  sum  of  the 
descendants  of  all  elder  brothers  of  v.  Note  that  the  sons 
of  every  vertex  are  ranked  by  their  vertex  numbers: 

(Broadcast  columnwise:)  nd' [u , j] :=nd[j , j] ; 

Compute  the  sum  of  all  the  descendants  of  the  sons  of  u 
whose  vertex  number  are  less  than  j.  Note  that  j  may  not  be' 
a  son  of  u  here: 

nds[u,j]:=  Z(Ui(if  T[u,k] 

then  nd' [u,k] 
else  0),  (Lemma  3.3(ii))j 
Now,  set  nds[u,j ]  to  zero  if  j  is  not  a  son  of  u. 
ndsiu ,j] :=  if  T[u,j ]  then  nds[u,j 3  else  Oj 
Finally,  in  step  3,  we  compute,  pre(v) ,  the  preorder 
number  using  the  formula  given  in  Lemma  2.15. 

(Broadcast  nds  columnwise:)  nds '  O , j ] : =nds where 
F (j)=ui  since  every  j  has  a  single  father,  there  is 
exactly  one  non-zero  nds  in  each  column,  Lemma  3„  3 
can  be  applied  here. 

(Recall  again  that  T*[krv]=^  iff  k  is  an  ancestor  of 
v/. ) 

pre[v ,v] : =Zk {nds' [v ,k]*T* [k,v]) j 
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then  compute 

pre[v,v] :=pre[v ,v]+ level [v,v]  locally  at  every 
processor . 

Clearly,  preiv ,v] =pre(v) . 

From  Lemmas  3.1,  3.2,  3.3  and  3.5,  we  have  the 
indicated  time  and  hardware  resource  complexities.  ■ 

Lemma  3.13:  Given  an  adjacency  matrix  M  of  an  undirected 
tree  G  whose  diameter  is  d" ,  constructing  the  edge  set  A2  to 
bridge-connect  G  can  be  done  in  Oi t (n) • ( lgd"+ 1 ) )  time  with 
H(n )  hardware  resources. 

Proof:  Construct  an  adjacency  matrix  T  for  a  (directed)  BFS 
spanning  tree  of  G  and  choose  a  vertex  with  degree  greater 
than  1  as  the  root.  Note  that  when  n= 2,  there  is  no  way  to 
bridge-connect  G  without  introducing  parallel  edges.  It  is 
therefore  resonable  to  assume  n^3  and  this  implies  that  a 
vertex  of  degree  greater  than  1  must  exist.  This  step 
effectively  converts  G  into  a  directed  tree  whose  root  has 
at  least  two  sons. 

Next,  compute  the  preorder  numbers  preiv),  VveV  (Lemma 
3.12).  Note  that  pre[v ,v] =pre(v) .  Then  find  the  pendants  and 
sort  them  by  preorder  number: 

notleafiu ,u]:=\/l  =  ^T[u ,k] ,  (Lemma  3 . 3 ( i ) ) j 
Recall  that  T  is  a  directed  tree,  not  1  eaf[u ,u]  =  0  iff  u  is 

a  pendant. 

Erase  the  preorder  number  of  all  non-pendants: 

pre[u,u]  :  =  if  ~'notleaf[u,u 3  then  pre[u,u]  else  Oj 
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Order  the  pendants  by  preorder  numbers: 

(Broadcast  rowwise:)  pre[u,w]  :  =pre[ti,u] ; 

(Broadcast  columnwise:)  pre' [w,u] : =pre[u, u] ; 

Rank-leaf [u ,u] :=Ik  =  i (Pre[u , k]<pre' [u ,k] ) , 

(Lemma  3 . 3  (  i  )  ) ; 

Since  the  root  is  a  non-pendant,  Rank-leaf [u ,u]<n  if  u  is  a 
pendant,  and  Rank-leaf [u ,u]=n  if  u  is  a  non-pendant.  As  a 
result,  the  non-pendants  can  be  eliminated  easily: 

Rank-leaf  [u  ru]  :=  if  (Pank-leaf  [u,u]  <n) 

then  Rank-leaf [u ru] 
else  0; 

Thus,  the  pendants  are  sorted  by  preorder  number  and 
Rank-leaf [u ,u]  indicates  the  position  of  u  in  the  sorted 
sequence . 

Now,  determine  the  total  number  of  pendants: 

T-leaf[L/,u]:=Ik=i  (pre’  [u  ,k]>  0 )  ,  ( Lemma  3 . 3  (  i  ) )  ; 

7-leaf  [u,u] ,  VueV ,  contains  the  total  number  of  perdants 
in  the  directed  tree. 

Finally,  we  construct  A2  as  follows. 

For  all  u  i  Rank-leaf [u ru]>0 ,  compute  the  rank  of  v  such 
that  iu,v)  is  to  be  inserted  into  A2* 

Partner-rank[L/,u]:  =  if  (Rank-leaf  [u -leaf  [u ,u] /21  ) 

then  /?ank-leaf[ty,l/]  +  LT-leaf  [u ,u]/ 2-1  y 
Note  that  the  division  can  be  realized  by  'left  shifting  one 
bit'.  Finally, 

(Broadcast  rowwise:) 

Partner-rank[iy,w] :  =Partner-rank  [u ,u]  \ 
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(Broadcast  columnwise:) 
ftank-leaf '  [w  ru] :  =Pank-leaf  [u ,u] ; 

A2[u rv]  :=  (Pank-leaf'  [L/,i/]=Partner-rank[u,\/] )  . 

Up  to  this  point,  we  have  indeed  constructed  a 
'directed'  edge  set  A2  rather  than  the  desired  'undirected' 
edge  set.  In  order  to  complete  the  construction  of  A2  ,  we 
may  construct  the  transpose  of  the  (directed)  A2  just 
constructed.  This  process  is  exactly  the  same  as  that 
described  in  Lemma  3.7,  the  discussion  is  thus  omitted. 

From  Lemmas  3.1,  3.2,  3.3,  3.12  and  Theorem  3.6,  we 
have  the  indicated  time  and  hardware  resource  complexities. 

■ 

Theorem  3. .14:  The  bridge-connectivity  augmentation  problem 
can  be  solved  in  0( t (n) • (max ( lgd, lgd" ) + 1 ) )  time  with  H{n) 
hardware  resources  on  the  MMM ,  where  d"  is  the  diameter  of 
GoiVo'EoUAo). 

Proof:  Immediate  from  Lemmas  3..  10,  3.11,  3.12,  3.13  and  the 
observation  that  d"  can  be  greater  than  d.  ■ 

3.9  The  Biconnectivity  Problem 

In  this  section,  we  shall  implement  Algorithm  Biconnect 
of  Chapter  2  on  the  MMM,  We  shall  determine  the  set  of  all 
separation  vertices  at  the  same  time. 

Theorem  3. .15:  Given  an  adjacency  matrix  M  of  an  undirected 
graph  G,  the  set  of  all  separation  vertices  and  biconnected 
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components  in  G  can  be  determined  in 

0(  t  (n)  *  (max  ( lgd,  lgd"  )  +  1  )  )  time  with  H(n)  hardware  resources 


on  the  MMM ,  where  cf"  is  the  diameter  of  G"  defined  in 
Section  2 .  10.2. 

Proof:  We  proceed  in  5  steps. 

Steps  1-3  are  the  same  as  Steps  1-3  of  the  algorithm 
for  bridge-connectivity;  their  discussions  are  thus  omitted. 
Recall  that  after  step  3,  level [v,k]  -level (v) , 

-^HLCA  [u,  V/]  =/HLCA  (v)  ,  Vv  tkeV  and  the  adjacency  matrix  AT  of 
G~T  is  available. 

In  Step  4,  we  shall  construct  an  adjacency  matrix  Af" 
for  G" (£',£")  and  determine  the  connected  components  of  G" : 

(Broadcast  rowwise:)  ^HLCA[v,k]  :=  /HLCA [v , v] ,  Vv,keV ; 

(Broadcast  columnwise :) level ' [k, v] := level [v,v] ,  Vv,keVi 

(Broadcast  columnwise  :)  /HLCA  ’  [k ,  v] :  =/HLCA  [v ,  v]  ,  Vv ,keV ; 

(Consider  if  HLCA  (u)  i-VSu)  : 

Af"  [UjV]  :  =  (7'  )  *  [u, v]  /\  ( level  ’  [u,y/]>/HLCA[l/,v] ); 

(Consider  if  HLCA (v)  i-uSv)  : 

AT  iu,v] :  =Af"  [u,v] 

\/{T*[u,v]  A(  level  [urv]>  7 hlca'  [u,\/] )); 

(Consider  the  non-tree  edges): 

Af"  [l/,v/]  :=AT  Ci7,v/]  V  AT  [l/,\/]; 

(*  Note:  Since  each  1/  uniquely  determines  F(v)  ,  we 
conveniently  use  v  to  represent  (Fiv) ,v)  in  the  vertex  set 
of  G"  here  *) 

Now  compute  (AT)*,  this  determines  the  connected 
components  of  G" . 
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In  Step  5,  we  shall  determine  the  set  of  all  separation 
vertices  of  G.  Recall  that  F[v ,v]=F(v) ,  VveV  after  Step  1. 

(Broadcast  columnwise:)  F? [k,v]  :=  F[v,v],  Vv,keV. 
Compute  the  matrix  subroot  such  that  subroot [v ,v]  contains 
the  root  of  the  subtree  (of  the  BFS  spanning  tree) 
containing  v : 

V  vtr  :  Subroot[v,v] :=(min<) k (if  (AT  ) * [vrk]  then  F'[v,k]}, 

(Lemma  3 . 3  (  i  )  )  ; 

Once  subroot  is  computed,  constructing  the  matrix  Spt 
such  that  Spt[v ,v]= 1  iff  v  is  a  separation  vertex  of  G  and 
adding  these  separation  vertices  to  the  connected  components 
of  G"  to  form  the  biconnected  components  of  G  should  be 
straightforward.  We  omitt  the  details  here. 

By  Theorem  3.6  and  Lemmas  3.1,  3.2,  3.3,  3.5,  and  note 
that  d "  can  be  of  0(n)  even  if  d«n ,  we  have:  the 
biconnectivity  problem  can  be  sovled  in 

0(  t  in)  *  (max  ( lgcf,  lgcf"  )  + 1 ) )  time  with  H(n)  hardware  resources 
on  the  MMM." 

3.-10  Performance  on  Existing  Models 

The  aim  of  this  section  is  to  enhance  the  results  we 
have  achieved  in  the  previous  sections  by  showing  that  the 
MMM  includes  many  of  the  well-known  existing  computer  models 
as  its  special  cases  and  that  the  performance  of  our 
algorithms  on  all  these  models  are  very  efficient. 


Lemma  3..16:  The  following  computer  models  are  instances  of 
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the  MMM: 


MCN (VLSI ) 
PSN (VLSI ) 
CCC (VLSI ) 
OTN (VLSI ) 
OTC (VLSI ) 
SIMD-CCC 


Mesh  connected  Networks [ CANN69 , DEKE8 1 , ATAL8  2  ] ; 
Perfect  Shuffle  Networks [ STON7 1 , DEKE8 1 ] ; 
Cube-connected  Cycles [PREP8 1 ] ; 

Orthogonal  Tree  Networks [NATH82 ] ; 

Orthogonal  Tree  Cycles [NATH82 ] ; 

SIMD  Cube-connected  Computers [DEKE8 1 ] ; 

PRAM  :  SIMD  Shared  Memory  model  with  read  conflicts 
permi tted[WYLL79 ] ; 

WRAM  :  SIMD  Shared  Memory  Model  with  read  and  write 
conflicts  permi tted[SHIL82 ,KUCE82] . 

Proof:  We  show  in  the  following  table  that  each  of  these 
models  has  an  ordinary  matrix  multiplication  algorithm.  For 
other  features  of  the  MMM  that  they  possess,  we  refer  the 
reader  to  the  references  cited. 


The  t ime  and  hardware 
matrix  multiplication 

time 

OUT) 

0  (lg 
0(lg 
0(lg 
CKlg 
0(lg 
0(lg2n) 
use  the 


model 

MCN (VLSI ) 
PSN (VLSI ) 
CCC (VLSI ) 
QTN (VLSI ) 
OTC (VLSI ) 
SIMD-CCC 
PRAM 
WRAM 


resource  complexities  of  the  ordinary 
algorithms  on  the  above-listed  models 
chip  area  AT2  #  of  processors 


n ) 
n) 
2n) 
n) 
n) 


n2 

n6/lg3n 

n6 /lq2n 
n4lg2n 
n 4 


Oin^T  ---  [DEKE81] 

0(ne  • lgn)  ---  [DEKE81] 

0(n6 • lg2n)  -  [PREP81] 

0(n4-lg6n)  —  [nathsi] 

0(r?4  •  lg 4n)  —  [NATH81] 

—  xn3/lgn-|  [DERE81] 
---  j-n3/lgn-,  [ SAVA77  ] 
algorithm  for  the  PRAM. 


The  theorem  thus  follows.  ■ 

Before  combining  Lemma  3.16  with  the  results  obtained 
in  the  previous  sections  to  produce  the  desired  results,  we 


would  like  to  point  out  that  the  time  complexity  of  our 
algorithms  are  dominated  by  the  all-pair  shortest  path 


algorithm  which  is  used  to  generate  the  BFS  spanning  forest. 


i 


i 


; 


* 


is 


' 


-  ><■ 


' 

'■  .  :ls 


f 


X  • 


>  r  '  1 


91 


If  for  a  particular  MMM ,  there  exists  an  all-pair  shortest 
path  algorithm  which  runs  faster  than  our  algorithm 
described  in  Lemma  3.4,  then  that  all-pair  shortest  path 
algorithm  could  be  used  in  place  of  ours  and  the  time 
complexity  of  the  resulting  algorithms  is  improved.  This  is 
the  case  for  the  MCN  and  WRAM  as  is  shown  below. 


The  time  and  hardware  resource  complexities  of  our 
algorithms  on  various  existing  models . 


model 

time 

chip  area  AT2  #  of  processors 

MCN (VLSI ) 

OTn 7t 

n2 

Oin 4)  - [VANS 80 ] 

PSN (VLSI ) 

0(lgn*L) 

n6/lg3n 

0(L2*n6 /lqn)  - 

CCC (VLSI ) 

0(lgn*L) 

n6/lg2r? 

0(n6*L2)  - 

OTN (VLSI )  ' 

0(lqn*L) 

r?4lg2r? 

0(n4lq4n*L2)  - 

OTC (VLSI ) 

0(lqn*L) 

r?4 

0(n4lq2n*L2)  - 

SIMD-CCC 

0(lqn*L) 

— 

—  rn3/lgn-i 

PRAM 

0{lqn*L) 

— 

jnVlgn-, 

WRAM 

0(L)  t 

— 

-  n 4  [KUCE82] 

ote:  L  =  max 

( lgcf,  lg<d"  )  +  1 

, 1 <drd" 

<n  for  bridge-connectivity 

augmentation  and  biconnect ivityj 
=  lgcf+1,  otherwise. 


t  indicates  the  all-pair  shortest  path  algorithm  in  the 
cited  reference  is  used  instead  of  Lemma  3.4. 

Of  all  the  above-mentioned  models,  no  algorithms  for 

the  bridge-connectivity  augmentation  problem  were  known 

previously.  Furthermore,  with  the  exception  of  the  MCN  and 

PRAM,  no  algorithms  for  the  bridge-connectivity  and 

biconnectivity  problems  were  reported.  For  the  sake  of 

comparison,  we  list  all  the  previously  known  results  below: 


The  time  and  hardware  resource  complexities  of  the 
previously  known  algorithms  on  various  existing  models . 
model  time  chip  area  AT2  #  of  processors 


(i)  The  BFS  spanning  forest 

mcn (vlsi )  0(n)  n2  0(n* )  —  [atal82] 
psn (vlsi )  0iig2n)  n6/lqn  0(n6lg3n)  ---  [dekes i ] 
simd-ccc  0(ig2n)  —  —  sn 3/igrh [dekes i ] 
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(ii)  The  lowest  common  ancestors 

pram  0(lg2n)  —  —  n3  [savasi] 

(iii)  The  fundamental  cycles 

pram  0( lg2n)  -  -  n3  [SAVA81] 

(iv)  The  2-colorabili ty  (Bipartite) 

mcn  O(n)  n2  0(nA)  — -  [atal82] 

(v)  Bridge-connectivity  and  Biconnectivity 

mcn (vlsi )  0(n)  n2  0(nA)  — [atal82] 

PRAM  0(lg2n)  -  -  r/73/lgn-1  [SAVA8 1  ] 

or  0(lq2nlgK)$  -  -  | E | n+n2 lgn[ SAVA8 1 ] 

or  0(lg2n)  -  -  n2*lgn  [SAVA81]f 

Noterf  for  bridge-connectivity  only; 

$  K  is  the  number  of  biconnected  components  in  the 
graph . 

The  efficiency  of  our  algorithms  should  be  evident  from 
the  tables. 

Finally,  we  shall  prove  a  lemma  which  would  be  useful 
in  employing  existing  results  to  improve  the  performance  of 
our  algorithms  on  the  PRAM  and  the  WRAM. 

Lemma  3.17:  Converting  an  undirected  forest  into  an  inverted 
(or  directed)  forest  takes  0(lgr?)  time  with  r?3  processors  on 
the  PRAM  and  the  WRAM. 

Proof:  We  shall  find  a  directed  spanning  forest  for  the 
undirected  forest  using  the  all-pair  shortest  path  method 
described  in  Lemma  3.6.  Since  there  is  a  unique  path  between 
every  pair  of  vertices  in  the  undirected  forest,  only  0(1) 
time  is  required  in  each  of  the  O(logn)  iterations  if  n3 
processors  are  used.  Specifically,  this  is  accomplished  as 
follows:  Assign  n  processors  to  each  pair  of  vertices  u  and 
v  such  that  each  of  these  n  processors  is  attached  to  a 


. 


*>  v-' 

' 

t 

1  '  I"  i 

Vjf'  f 


, 

'  * 
■ 


■  ■£  ^  .  ' 


.  ..  '  '  " 
‘  1 


I  i 


t 

'  '  • 


. 


93 


distinct  vertex.  During  the  / th  iteration  of  executing  the 
all-pair  shortest  path  algorithm,  the  processor  attached  tor 
vertex,  say  W,  will  examine  the  entries  D2 *  * '  ' ~ 1  > [u , W ]  and 
D2  *  * (  i  —  i 1 ' v] .  if  both  of  their  values  are  finite  and 
D2  *  *  (  i  - 1  )  [ijrk]  =2  1  "  i  ,  then  that  processor  will  add  their 
values  and  store  the  sum  into  D2**'[u,v],  It  is  easily 
verified  that  there  is  exactly  one  such  processor  finding 
the  above  condition  satisfied,  hence  no  write  conflicts 
would  occur  on  the  PRAM.  ■ 

As  the  first  application  of  Lemma  3.17,  we  shall  show 
that  the  processor  bound  of  our  algorithms  on  the  WRAM  can 
be  improved  to  0(n3). 

Corollary  3.18:  All  of  our  algorithms  described  in  this 
Chapter  run  in  O(lgn)  time  using  n3  processors  on  the  WRAM. 
Proof:  Construct  a  minimum  spanning  forest  for  the  given 
graph  in  CKlgn)  time  with  n+2\E\  processors [AWER83 ] .  Convert 
the  minimum  spanning  forest  into  a  directed  forest  using 
Lemma  3.17.  It  is  easily  verified  that  the  remaining  steps 
all  take  no  more  than  O(lgn)  time  and  n3  processors.  ■ 

3.11  Conclusions 

In  contrast  to  sequential  computation  where  the 
sequential  RAM  is  chosen  as  an  universally  accepted  model, 
there  is  no  universally  accepted  model  in  parallel 
computation.  Up  to  the  present,  the  parallel  computer  model 
which  has  had  the  greatest  degree  of  popularity  is  the  PRAM. 
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This  is  due  to  its  powerful  fan-out  capability  which 
provides  a  means  by  which  all  the  physical  constraints 
inherent  in  the  interconnec t ion  network  are  bypassed.  The 
designer  can  thus  concentrate  on  uncovering  the  inherent 
data-dependency  of  the  given  problem.  This  makes  the  task  of 
algorithm  design  much  easier.  As  a  consequence,  parallel 
algorithms  published  in  the  literature  are  mostly  designed 
for  the  PRAM  or  its  stronger  version,  the  WRAM. 
Unfortunately,  this  fan-out  capability  is  unrealistically 
powerful  in  the  sense  that  it  cannot  be  realized  with 
current  technology.  Its  acceptability  as  a  universal  model 
is  questionable. 

The  restricted  models  which  take  the  technological 
constraints  under  consideration  are  preferable  from  the 
practical  point  of  view  since  they  are  well-suited  for 
current  VLSI  technology.  However,  the  constraints  imposed  by 
their  limited  fan  in/out  capability  tend  to  obscure  the 
designer's  insight  and  make  the  design  of  efficient 
algorithms  more  difficult.  Furthermore,  portability  between 
these  models  is  weaker  due  to  the  vast  variety  of  ways  of 
constructing  the  interconnection  network.  To  remedy  the 
first  drawback,  one  may  first  design  an  algorithm  for  the 
problem  on  the  PRAM  and  then  map  the  algorithm  onto  the 
restricted  model  at  hand.  In  fact,  some  work  has  been  done 
using  this  approach [ SCHW80 , VISH8 1 b] .  However,  a  degradation 
in  time  complexity  (at  least  a  factor  of  lgn  in  the  existing 
works)  has  always  been  induced.  To  remedy  the  second 
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problem,  we  may  simulate  one  model  on  the  other.  So  far, 
these  simulations  are  done  at  the  abstract  level.  [SIEG77] 
and  [SIEG79]  are  exceptions. 

In  our  opinion,  the  MMM  model  proposed  in  this  chapter 
provides  a  better  solution  to  the  above  problems.  By 
reducing  many  of  the  basic  operations  we  use  into  operations 
of  the  form  defined  in  Lemma  3.1,  we  have  managed  to 
demonstrate  that  the  algorithms  presented  in  Chapter  2  for 
the  PRAM  can  be  implemented  on  many  of  the  existing 
restricted  models  with  no  degradation  in  time.  Moreover,  the 
portability  of  these  algorithms  on  various  models  is 
immediate  —  no  tedious  simulation  is  necessary.  Thus,  the 
MMM  model  seems  to  be  a  promising  tool  for  designing 
portable  algorithms. 
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Chapter  4 

IMPLEMENTATION  ON  THE  SEQUENTIAL  RAM 


4 . 1  Introduction 

In  Chapter  1,  it  was  mentioned  that  given  a  parallel 
algorithm  for  the  PRAM,  if  the  algorithm  runs  in  T in)  time 
using  Pin)  processors,  then  the  same  algorithm  can  run  on 
the  sequential  RAM  in  T(n)*P(r?)  time.  An  implication  of  this 
observation  is  that  each  of  the  algorithms  presented  in 
Chapter  2  immediately  induces  an  Oin2)  or  0in2lqn)  time 
algorithm  for  the  sequential  RAM.  Although  this  result  is 
optimal  for  dense  graphs,  we  shall  show  that,  we  can  do 
better  for  sparse  graphs  for  some  of  the  problems.  In  this 
chapter,  we  present  a  sequential  version  of  Algorithm 
Biconnect  which  finds  all  the  biconnected  components  as  well 
as  all  the  separation  vertices  of  an  undirected  graph.  This 
algorithm  requires  0(n+|E|)  time  and  space  which  is  optimal 
for  all  graphs.  Moreover,  it  does  not  rely  on  the  well-known 
depth-first  search  spanning  tree  but  uses  any  spanning  tree 
of  the  graph.  Thus,  this  is  another  example  to  show  that 
depth-first  search  is  not  always  necessary  for  dealing  with 
connectivity  properties  of  graphs  'efficiently'  (the  first 
example  was  given  by  Tarjan  in  [TARJ74 ]  concerning  finding 
all  bridges).  It  is  also  shown  that  this  algorithm  is  a 
generalization  of  Tarjan's  depth-first  search  algorithm 
presented  in  [TARJ72],  The  algorithm  also  detects  all 
bridges  and  hence  the  bridge-connected  components  of  the 
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graph  within  the  same  time  and  space  bounds. 

We  also  present  a  general  program  scheme  for  the 
bridge-connectivity  problem.  This  general  program  scheme 
runs  on  the  sequential  RAM  in  max (0(n+ \E\ ) ,T{g,0, ,02 ) )  time 
and  max (0( n+ \ E \ ) , S (g , 0 , , 02 ) )  space,  and  on  the  PRAM  in 
max (0(n/K+lg2n) ,T(g,0! ,02 ) )  time  with  nK(K>1)  processors, 
where  g,  01f  02  are  parameters  of  the  general  program 
scheme.  Clearly,  the  optimality  of  the  program  scheme 
depends  on  the  complexities  of  T (g,0i,02)  and  S(g,01,02).  We 
shall  show  that  by  substituting  several  appropriate 
functions  for  the  parameters  g,  0,  and  02 ,  we  can  derive 
most  of  the  existing  optimal  sequential  algorithms  as  well 
as  new  optimal  parallel  algorithms  including  Algorithm 
Bridges  presented  in  Chapter  2  for  finding  the  bridges. 

4.2  The  Sequential  Algorithm  for  Biconnectivity 

In  this  section,  we  present  a  sequential  algorithm  for 
finding  all  biconnected  components  and  all  separation 
vertices  of  an  undirected  graph.  As  with  Chapter  2,  since 
each  biconnected  component  is  completely  determined  by  its 
vertex  set,  it  suffices  to  find  the  vertex  sets  of  all  the 
biconnected  components. 

Let  G(V,E)  be  an  undirected  graph.  Without  loss  of 
generality,  we  again  assume  that  G  is  connected  and 
y={ 1 , 2 , . . . ,n} .  We  also  use  the  function  HLCA (u)  defined  in 
Chapter  2.  However,  we  redefine  it  here  because  there  is  a 
slight  modification  involved. 
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Definition:  Let  T{V,E ')  be  a  directed  spanning  tree  of  G  and 
L/eV, 

HLCh(u)=LCh(u,v)  in  7,  where  ( u,v)eE  and 

depth (LCA (u  rv) )<depth(LCA (u ,v' )  )  ,  V (u ,v' ) eE . 

4.2.1  An  Outline  of  the  Algorithm 

We  give  an  outline  of  the  algorithm  below: 

Algorithm  Seq-biconnect : 

1.  Create  a  spanning  tree  7'  of  G; 

2.  Convert  7’  to  a  directed  tree  T(V,E')}  again  let  the 
functions  F  and  depth  be  such  'that  F{v),  depth(\/)  are  the 
father  and  depth  of  v  in  7  respectively,  VveV • 

3.  Partition  7  into  connected  subgraphs,  called 

tr immed-subtrees  {7  j }  such  that  each  of  them  has  the 
following  properties: 

(i)  Each  7j  is  a  directed  tree  whose  root  has  exactly  one 
sonj 

( i  i ) a .  let  r  x  be  the  root  of  a  T  ,  for  any  vertex  vtr  \  in 
7j,  HLCA(i/)  is  a  descendant  of  r  \  i 

b.  for  every  internal  vertex  vtr  \  in  a  T  ,  there  exists 
a  proper  descendant  d  of  v  for  which  HLCA(cO  is  a 
proper  ancestor  of  v; 

c.  let  /  be  a  leaf-node  of  a  Tir  then  for  every  proper 
descendant  d  of  1  in  7,  HLCA(cO  is  a  descendant  of  7; 

4.  Construct  a  graph  G"  such  that  V/,'  =  {7i}  and 

(7 k ,7m ) eE"  iff  there  exists  an  edge  e  in  E  connecting  7 k 
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and  7m  and  the  end-vertices  of  e  is  neither  the  roots  of 
the  two  T's.  Find  all  the  connected  components  { { T  j } j }  in 
G"  ,  then  each  1/ j  {7  j  (10  }  j  is  the  vertex  set  of  a 
biconnected  component  in  G,  and  vice  versa,  where  T-,(V) 
is  the  vertex  set  of  7  ■,  .  ■ 

4.2.2  Partitioning  the  Directed  Tree 

The  input  to  the  algorithm  is  an  adjacency  list  of  G. 
Steps  1  and  2  are  trivial  and  can  clearly  be  done  in 
0(n+\E\)  time  and  space.  The  resulting  directed  spanning 
tree  7  is  represented  by  an  adjacency  list  which  takes  0(n) 
space . 

To  realize  the  partition  {Til  of  7  in  step  3,  we  will 
traverse  the  directed  spanning  tree  7  in  preorder  and  label 
every  vertex  with  its  preorder  number.  Henceforth,  we  will 
name  each  vertex  by  its  preorder  number,  i.e.  v=pre(v) . 

Definition:  For  veV , 

7oiv(v)=min{ji/|ii/=HLCA(x)  x  is  a  descendant  of  v  in  T}. 

For  example,  in  Figure  4  .  1  (  i  )  ,  7on/(3)  =  1  and  7  oiv  (  1  5 )  =  9  . 
Due  to  the  associativity  of  min,  the  above  equation  can  be 
rewritten  as: 

lowiv)  =min  ( (HLCA(\/)  }U{  1oia/(s)  \  5  is  a  son  of  \/  in  T}) 

The  complete  description  of  step  3  is  as  follows. 

1.  precount  :  =  1 ;  compute  F(v)  r  depth  (v/)  VveV;  compute 
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HLCA(\/)  ,VveV,  using  the  off-line  lowest  common  ancestors 
algorithm  presented  in  [ HARE8 0 ] . 

2.  CreateTi (r) ,  where  r  is  the  root  of  7. 


procedure  CreateTi (v) ; 
begin 

pre (v) := precoun t ; 
precount : = precount + 1 ; 

Push  \/  on  stack  stackT ; 
low(v) : =HLCA ( vO  ; 
for  every  son  s  of  v  do 
begin 

CreateTi (s) ; 
if  low(s)=pre(v) 

then  pop  stack 7  until  s  is  popped  and  then  output  v/ 
else  loiA/(v)  :  =min  ( 1ow( v)  ,  low(s) ) 

end 

end{of  CreateTi}; 

An  example  of  the  result  of  executing  step  3  on  the 
graph  in  Figure  4 .  1  ( i )  is  given  in  Figure  4.1(ii). 


Theorem  4.1:  Step  3  correctly  generates  the  set  of  all 
tr immed-subtrees  [T j } . 

Proof:  We  want  to  prove  that  whenever  /oiv(s)=v/,  the  vertices 
on  stackT  from  s  right  up  to  the  top  plus  vertex  v 
constitute  the  vertex  set  of  a  T \ .  This  is  done  by  induction 
on  the  number  of  7's  in  T. 

If  T  has  only  one  Tir  then  the  proof  is  trivial. 

Assume  that  the  induction  hypothesis  holds  for  all  T 
having  m  T)s.  Consider  a  T  having  m+ 1  7's.  Let  CreateTi (s) 
be  the  first  call  of  CreateTi  ending  with  low(s)=v.  This 
means  no  vertices  have  been  popped  from  stackT.  Therefore, 
the  vertices  on  stackT  from  s  to  the  top  and  vertex  \/  form 
the  vertex  set  of  a  subtree  Tv  of  7  rooted  at  v.  Tv  clearly 
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Figure  4 .  1  ( i  ) 

A  directed  spanning  tree  T(V,E'), 

The  solid  lines  are  the  tree  edges. 
The  dotted  lines  are  the  edges  in  G~T . 
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Figure  4 . 1 ( i i ) 

The  partition  {7,  }  of  7. 
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Figure  4  « 1  (  i  i i ) 

The  partition  { BCj }  of  T. 
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possesses  properties  (i)  and  (ii)c.  1oia/{s)=v  implies  that  Tv 
possesses  property  ( i  i )  a .  Finally,  Tv  must  possess  property 
( i i ) b  for  otherwise  there  is  a  proper  descendant  of  s  for 
which  1ow(ws)=1a/  where  ivs  is  a  son  of  iv.  This  contradicts  the 
assumption  that  CreateTi(s)  is  the  first  call  ending  with 
lowis)=v.  Thus,  Tv  is  a  T  of  T.  After  removing  the  vertices 
in  Tv  from  stackT ,  the  induction  hypothesis  ensures  that 
step  3  correctly  generates  the  remaining  m  T\s .■ 

The  complexity  of  Step  3  is  analyzed  as  follows. 

Theorem  4.2:  Step  3  of  Algorithm  Seq-biconnec t  takes 
0(n+\E\)  time  and  space  on  the  sequential  RAM. 
proof:  Traversing  the  spanning  tree  T  and  maintaining  the 
stack  StackT  takes  Oin)  time.  Computing  HLCA(\/)  ,VveV  takes 
0(n+\E\)  time  and  space [HARE80 ] .  Moreover,  both  the  stack 
StackT  and  the  stack  for  governing  the  traversal  of  T  do  not 
grow  beyond  n  unit  of  space.  ■ 

4.2.3  Combining  the  Tr immed-subtrees 

After  step  3  is  finished,  the  directed  spanning  tree  T 
is  partitioned  into  tr immed-subtrees  T's.  From  Lemma 
2.25(i),(ii)  and  property  (ii)  of  T  ,  it  is  easily  shown 
that  each  T f  is  contained  within  a  unique  biconnected 
component  in  G.  It  is  also  easily  shown  that  every  two 
adjacent  T's  intersect  at  no  more  than  one  vertex.  If, 
however,  two  adjacent  T)s  are  connected  by  an  edge  in  G~T , 
which  is  not  incident  with  either  of  the  roots  of  them,  then 
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they  should  be  combined  together  as  they  are  contained 
within  the  same  biconnected  component ( Lemma  2.25(iii)).  In 
the  following,  we  will  show  that  when  no  such  combination 
can  be  carried  out  any  further,  the  result  is  the  vertex 
sets  of  all  the  biconnected  components  in  G. 

As  an  example,  consider  Figure  4  .  1  (  i  i  )  again.  For 
clarity,  we  denote  each  t r immed-subt ree  in  the  figure  by 
7S ( i )  where  s( / )  is  the  preorder  number  of  the  unique  son  of 
the  root  of  the  tr immed-subt ree .  For  instance,  the 
tr immed-subt ree  containing  vertices  9,  15,  16,  17,  18,  and 
19  is  denoted  by  T^s^  Hence  the  directed  tree  in  Figure 
4 . 1 ( i )  is  divided  into  tr immed-subt rees  72 ,  7 7,  T20,  T2  ^r 
Te,  T 11,  r13,  7,4,  715,  T  2  n ,  72  s  and  T27  in  Figure  4.1(ii). 
T2  and  T7  are  connected  by  an  edge  (5,9)  in  G~T  and  neither 
5  nor  9  is  the  root  of  72  or  77.  It  can  be  easily  seen  that 
72  and  77  are  indeed  contained  within  the  same  biconnected 
component.  Similarly,  the  edge  (5,6)  joining  72  and  76 
implies  that  72  and  76  are  contained  within  the  same 
biconnected  component.  Consider  again  the  edge  (5,9);  this 
edge  also  connects  72  and  7i5.  However,  9  being  the  root  of 
7 i 5  does  not  imply  that  72  and  7i5  are  contained  within  the 
same  biconnected  component  (in  fact,  they  are  not).  The  same 
argument  applies  to  the  edge  (23,26)  which  connects  T  2  u,  and 
72  7  ,  and  the  edge  (12,10)  which  connects  7ii  and  77. 

Definition:  Let  7i,72e{7|}  be  two  tr immed-subtrees .  T, — T2 
iff  ( i )  7 , =7  2 ; 
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or  (ii)  there  exists  an  edge  e  in  G~T  such  that  e 

connects  T and  T2,  and  e  is  not  incident  with 
ri  or  r2,  where  r i  and  r2  are  the  roots  of 
and  T2  respectively, 

or  (iii)  Tt — 73  for  some  T3e{T-,}  such  that  73 — T2. 

It  can  be  easily  shown  that  any  edge  in  G~T  violating 
the  criteria  given  in  the  above  definition  is  of  one  of  the 
types  depicted  in  Figure  4.2. 

The  binary  relation  —  is  an  equivalence  relation  on 
{ 7  s }  and  thus  partitions  { 7  j }  into  equivalence  classes 
{ { 7  j } j } .  Let  BC j =U j {7 i } j .  The  f  ollowing  theorem  points  out 
that  the  vertex  sets  of  all  the  BC)s  is  exactly  the  vertex 
sets  of  all  the  biconnected  components  in  G. 

Theorem  4.3:  v ,v'  are  in  B,  for  some  biconnected  component  B 
of  G,  iff  v,v'eBC}(V),  for  some  j,  where  BC j  (V )  stands  for 
the  set  of  all  vertices  in  BC  j. 

Proof:  If  part:  From  Lemma  2. 25 (iii),  it  is  obvious  that 
each  BC j  is  completely  contained  within  a  biconnected 
component  of  G. 

Only  if  part:  This  is  proven  by  contradiction.  Without  loss 
of  generality,  let  us  assume  that  BCk  and  BCW  are  distinct, 
and  BCk (V)UBCm (V) =B (V) .  It  should  be  clear  that  BCk  and  BCm 
intersect  at  no  more  than  one  vertex  which  is  either  the 
root  of  BCk  or  BCm.  Without  loss  of  generality,  we  assume 
they  intersect  at  rk ,  the  root  of  BCk  •  Since  rk  cannot  be  a 
separation  vertex  in  B,  there  must  be  an  edge  in  B  joining 
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Figure  4.2  Non-tree  edges  violating  the  —  definition 
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BC k  and  BCm  not  incident  with  rk  .  Obviously,  this  edge  is 
not  incident  with  rm  either,  for  otherwise,  rm  would  be  in 
7k  forcing  rk=rm  which  contradicts  the  fact  that  the  edge  is 
not  incident  with  rk .  This  implies  that  the  two  BCj’s  would 
have  the  ’ — '  relationship  leading  to  a  contradiction." 

Lemma  4.4:  The  problem  of  finding  the  set  of  all  BC  j  's  in  7 
can  be  reduced  to  the  problem  of  finding  the  set  of  all 
connected  components  of  an  undirected  graph. 

Proof:  Define  a  graph  G"({7i},£")  such  that  (7k,7m)eE"  iff 
there  exists  an  edge  e  in  G~T  such  that  e  connects  Tk  and  7m 
and  e  is  not  incident  with  either  of  the  roots  of  the  two 
7'jS.  It  is  clear  that  7* k  ,  Tm  belong  to  the  same  connected 
component  of  G"  iff  7k — 7m." 

It  should  be  clear  that  every  BC j  is  a  directed 
spanning  tree  of  its  corresponding  biconnected  component.  In 
fact,  {BCj}={BjA7}  defined  in  Section  2.10.3.  Consequently, 
we  have: 

Theorem  4.5:  Let  aeV .  a  is  a  separation  vertex  of  G 
iff  a  is  the  root  of  some  BC j  if  a^ri 
or  a  is  the  root  of  more  than  one  BC j  if  a=r. 

Proof:  See  Lemma  2.29." 

Each  tr immed-subtree  T\  determined  in  step  3  is 
represented  by  a  linear  list  containing  all  the  vertices  of 
7j  except  the  root  r  The  reason  for  excluding  r-,  should  be 
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obvious  as  it  may  belong  to  other  T  \' s  at  the  same  time. 
Nevertheless,  r \  can  be  relocated  easily  as  r,=F(Si)  where 
Si  is  the  ’only'  son  of  r \  in  7,  .  Sj  is  also  used  as  the 
representative  of  T-,  in  G"  .  The  linear  lists  are  created 
while  the  vertices  are  popped  from  stacKT  in  procedure 
CreateTi.  Note  that  S\  is  the  last  vertex  popped  from  the 
stack  and  is  therefore  easily  identified.  A  vector  superV  is 

also  created  at  the  same  time  such  that  for  each  vertex  v/  in 

G,  superV {v)=S;  iff  v/  is  in  T\  and  v±r\.  The  purpose  of 
SuperV  is  to  tell  to  which  T-,  each  vertex  \/  belongs.  The 

exclusion  of  r  from  the  list  ensures  that  situations 

depicted  in  Figure  4.2(i),(ii)  are  always  handled  correctly. 
In  other  words,  the  edges  shown  would  never  be  mistaken  as 
edges  establishing  the  —  relationship  between  the  7k  and  Tm 
shown.  As  a  consequence,  the  edges  which  must  be  taken  care 
of  are  those  edges  in  which  one  end-vertex  is  a  r*k  and  the 
other  end-vertex  is  in  Tk  (Figure  4.2(iii) ). 

To  create  the  graph  G” (V” ,E" )  in  step  4,  an  adjacency 
list  of  G"  must  be  created.  We  proceed  as  follows.  The 
linear  lists  for  the  T's  are  scanned  one  at  a  time.  Suppose 
the  linear  list  being  examined  corresponds  to  Tk ,  then  for 
each  vertex  v  stored  in  the  linear  list,  the  adjacency  list 
of  v  in  G  is  scanned.  For  each  node  u  encountered  in  the 
adjacency  list,  the  following  tests  are  performed: 


(i)  test  if  superV (u)^Skl 

(ii)  test  if  F(sk )tui 

(iii)  test  if  F {superV (u) ) ±v . 
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Test  (i)  is  to  ensure  u ,v  do  not  belong  to  the  same 
tr immed-subt ree  Tk  while  tests  (ii)  and  (iii)  are  done  to 
ensure  that  the  edge  (vru)  is  not  an  edge  of  the  form  shown 
in  Figure  4.2(iii).  Note  that  no  tree  edges  pass  the  tests. 
If  the  edge  passes  the  tests,  then  a  new  node  containing  the 
vertex  superV(u)  is  added  to  the  adjacency  list  of  sk  in  G"  , 
thereby  establishing  the  '  — '  relationship  between- sk  and 
SuperV(u).  When  all  the  linear  lists  are  processed,  the 
adjacency  list  of  G"  is  complete.  Note  that  G"  may  be  a 
’multigraph  (i.e.  there  may  be  more  than  one  edge  joining  two 
vertices).  However,  |E"|<|£|. 

The  connected  components  of  G"  are  then  determined  by 
traversing  the  graph  G"  ,  using  any  standard  traversal 
technique.  For  each  connected  component  of  G" ,  all  the 
linear  lists  corresponding  to  the  T\s  in  the  component  are 
merged  together  and  the  root  r j  of  smallest  depth  among  all 
the  roots  of  these  T's  is  determined.  This  rj  and  the 
vertices  in  the  list  resulting  from  the  merge  form  the 
vertex  set  of  a  biconnected  component  in  G.  Moreover,  from 
Theorem  4.5,  the  r  j  is  a  separation  vertex  of  G  if  r-^r.  To 
determine  if  the  root  r  is  a  separation  vertex  of  G,  we 
proceed  as  follows.  A  Boolean  variable  called  once  is 
initialized  to  false  at  the  beginning.  Whenever  a  component 
of  G"  is  completely  traversed,  the  corresponding  vertex  r i 
is  examined.  If  it  is  r,  the  variable  once  is  examined.  If 
once  has  the  value  false,  it  will  be  set  to  true  and  the  r 
is  discarded.  Otherwise,  by  Theorem  4.5,  r  must  be  a 


•  »'■  - 

. 


' 

■ 

* 


\  '  -  . 


. 


1 1 1 


separation  vertex  of  G. 

When  finally  all  the  components  of  G"  are  determined, 
the  vertex  sets  of  all  the  biconnected  components  as  well  as 
the  separation  vertices  of  G  are  also  determined.  Figure 
4.1(iii)  illustrates  how  the  tr immed-subt rees  depicted  in 
Figure  4 .  1  (  i  i )  are  combined  to  form  the  BCj’s.  The 
correctness  of  step  4  should  be  obvious  from  the  above 
dicussion.  The  time  complexity  of  step  4  is  analyzed  as 
follows . 

Theorem  4.6:  Step  4  of  Algorithm  Seq-biconnec t  takes 
0(n+\E\)  time  and  space. 

Proof:  The  construction  of  the  adjacency  list  of  G"  takes 
0(n+\E\)  time.  Traversing  G"  so  as  to  determine  the  vertex 
sets  of  all  the  biconnected  components  of  G  takes  0(n+\E\) 
time  and  the  creation  of  the  vector  superV  takes  0(n)  time. 
As  for  the  space  complexity,  superV  takes  0(n)  space  and  The 
adjacency  list  of  G"  is  deary  bounded  by  Oin+\E\)  even  if 
G"  is  a  multigraph.  Hence,  step  4  can  be  done  in  0(n+\E\) 
time  and  space.  ■ 

In  summary,  the  Algorithm  Seq-biconnect  takes  0(n+\E\) 
time  and  space  to  generate  the  vertex  sets  of  all  the 
biconnected  components  and  the  set  of  all  separation 
vertices  of  G. 
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4.2.4  Discussion  of  Other  Related  Work 

Consider  what  happens  if  the  directed  spanning  tree  T 
happens  to  be  a  depth-first  search  spanning  tree  of  G.  In 
this  case,  no  cross  edges [ TARJ72 ]  exist.  This  implies  that 
the  ' — ’  relationship  does  not  exist  between  any  two  T's. 
Thus  step  4  will  be  omitted.  As  for  step  3,  since  all  the 
edges  ( v,u )  in  E~E'  are  back  edges [TARJ72 ] ,  LCA {v,u)=u  or  v 
depending  on  which  is  the  ancestor  of  the  other.  As  a 
consequence,  the  value  low(v)  becomes: 

low(v)  =min  ( {F  (v) }  U  {7oiv(s)|s  is  a  son  of  v}  U  [ia/\(v,ia/) 
is  a  back  edge  in  G~T } ) . 

Comparing  ]oia/{v)  with  lowpt(v)  in  [TARJ72  , p.  1 5 1  ]  and 
procedure  CreateTi  with  procedure  BICONNECT  in 
[TARJ72 ,p. 1 53] ,  it  is  obvious  that  they  are  basically 
equivalent.  Hence,  the  depth-first  search  algorithm  for 
determining  the  biconnected  components [TARJ72 ]  is  a  special 
case  of  our  algorithm.  Clearly,  our  sequential  algorithm 
could  also  detect  the  bridges  and  hence  the  bridge-connected 
components  of  G  within  the  same  time  and  space  bounds. 

Remark : 

Recently,  Tarjan  has  independently  achieved  a  similar 
result [TARJ82 ]  by  using  another  technique  which  does  not 
involve  computing  the  LCA  values.  His  algorithm  is  not  a 
generalization  (in  the  sense  described  above)  of  the 
depth-first  search[TARJ72 ]  algorithm. 
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4.3  A  General  Program  Scheme  for  Finding  Bridges 

r 

4.3.1  The  General  Program  Scheme 

In  this  section,  we  present  a  general  program  scheme 
for  finding  the  bridges  of  an  undirected  graph  G(V ,£) .  We 
shall  show  that  by  substituting  the  parameters  in  the 
program  scheme  with  various  specific  functions,  a  number  of 
optimal  algorithms  for  finding  the  bridges  can  be  derived. 
Included  in  these  are  the  known  optimal  sequential 
algorithms  as  well  as  new  parallel  algorithms  for  finding 
the  bridges. 

The  general  program  scheme  is  based  on  the  following 
lemma  which  was  stated  in  a  different  way  in  Theorem  2.20. 

Lemma  4.7:  Let  T(V,E')  be  a  directed  spanning  tree  of  a 
connected,  undirected  graph  GiV ,E)  and  e=<£(a) ,a>e£' .  e  is 
bridge  in  G  iff  for  every  descendant  \/  of  a,  if  (\/,to/)  e£-£'  , 
then  iv  is  a  descendant  of  a. 


The  General  Program  Scheme. 

Input:  The  adjacency  matrix  or  list  of  G(V,E )j 
Output:  The  set  of  all  bridges  of  GiV,E); 


.  Find  a  directed  spanning  tree  T(V,E')  of  GiV,E )j 


2.  Define  g :(£-£’ )£/£" — >N ,  0i  :V-->N ,  02:V — >N ,  where  E”  is  the 
set  {{v,v)\veV},  N  is  the  set  of  integers,  such  that  the 
following  condition  is  satisfied: 
for  every  aeV ,  let  v  be  any  descendant  of  a,  and 
(v,w)e (£-£' )U{ (v,v) } , 

then  0 t (a) ^giv,w) ^02 (a)  iff  w  is  a  descendant  of  a. 
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3.  For  every  veV ,  find 

L(\/)=min{g(\/,to/)  (\z,m /)  e  (£-£’  )U{  (v ,  v) } } ; 
H  (v )  =max  {g(v  ,ia/)  (v  ,w)  e  (£-£'  )U{  (v  ,v) } }  . 


4.  For  every  aeV ,  find 

min (a) =min {L (v)  v  is  a  descendant  of  a  in  7}; 
max ( a )  =max { H ( \/ )  \/  is  a  descendant  of  a  in  7}. 

5.  For  every  ae V, 

(F (a) ,a)  is  a  bridge  iff  0 } (a)^mi n(a)  and  max(a) <02 (a) . 

Theorem  4.8:  The  general  program  scheme  correctly  finds  all 
the  bridges  of  G(V ,E) . 

Proof:  From  the  definitions  of  minia)  and  max(a) , 

0i (a)^mi n(a) ^ max ( a ) ^0  2 ( a ) 

iff  0! (a)^L(v)  and  H(v)S02(a)  VveV ,  where  v/  is  a  descendant 
of  a 

iff  0!  (a)^giv  rw)<02  (a)  ,  V(v,ia/)  e  (£-£'  )l/{  (v,  v) }  and  1/  is  a 
descendant  of  a 

iff  for  every  descendant  v  of  a,  if  (v ,w) e (E-E' )U{(v ,v) } , 

then  im  is  a  descendant  of  a  (The  condition  given  in  Step 
2) 

iff  (F(a),a)  is  a  bridge  in  G  (Lemma  4.7).  ■ 

4.3.2  Implementation  on  the  Sequential  RAM 

Theorem  4.9:  The  general  program  scheme  takes 

max(0(n+ |£|  )  ,T(sff0!  ,02 ) )  time  and  max.{0(n+\E\)  ,S{g  ,0y  ,02) ) 

space  to  find  the  set  of  bridges  on  the  sequential  RAM, 

where  T(gf,01/02)  and  S(gf,01,02)  are  the  time  and  space 

needed  to  compute  the  functions  g,  0 ^  and  02. 

Proof:  Using  the  adjacency  list  of  G,  Steps  1  and  3  can 
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clearly  be  done  in  0(n+|£|)  time  and  space.  Due  to  the 
associativity  of  min,  min{a)=min ( {min(as ) |as  is  a  son  of 
a}U{L(a)}).  The  same  argument  applies  to  maxia) .  Therefore 
by  simply  traversing  the  spanning  tree  T  in  preorder  or 
postorder,  Step  4  can  be  done  in  0(n+\E\)  time  and  space. 
Step  5  takes  0(n)  time  and  space.  Hence  the  general  program 
scheme  takes  max{0(n+ \E\ ) ,T(g,0! ,02 ) )  time  and 
max(0(n+  |£f|  )  ,S(g,0}  ,02 ) )  space.  ■ 

Based  on  the  above  general  program  scheme,  several 
optimal  sequential  algorithms  for  finding  the  bridges  of  G 
can  be  generated  as  follows. 

Corollary  4.!0:  Let  giv  ,ia/)  =pre(w)  V(v,w)  e  (£-£'  )U{  W  ,v) }; 

0i (a)-pne(a) i 

02 (a) -pre(a) +nd(a) -  1 ,  VaeV ,  where  nd(a)  is  the  number 
of  descendants  of  a  and  pre(a)  is  the  preorder  number  of  a. 
Then  the  general  program  scheme  finds  the  bridges  of  G(V ,E) 
in  0(n+\E\)  time  and  space. 

Proof:  It  is  easy  to  show  that  for  every  ae V,  if  v  is  a 
descendant  of  a  and  (v,w) e (£-£' )U{ (v , v) }  then  w  is  a 
descendant  of  a  iff  pre(a)  ^preiiA/)  ^pre(a) +nd  ia)  -  1.  Therefore 
the  resulting  program  scheme  correctly  identifies  all  the 
bridges.  Furthermore,  pre(v)  VveV  can  be  computed  in  0(n) 
time  and  space [HQRQ79 ] .  nd(v)  VveV  can  be  computed  in 
0(n+|£|)  time  and  space  by  using  the  fact  that 
nd(v)=L  -,nd{ /)  +  1  V  i  eV  s  where  l/s  is  the  set  of  all  sons  of  v. 
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Hence  T (g , 0 , , 02 ) =S (g , 0 , , 02 ) =0( n+ | E \ ) .  ■ 

It  is  interesting  to  note  that  when  7  is  a  depth  first 
search  spanning  tree,  Corollary  4.10  is  equivalent  to  the 
depth  first  search  algorithm  for  finding  the 
bridges[EVEN79,p.67,  Ex. 3. 7], 

Corollary  4.1_1:  Let  g(v  ,w)  =post  (w)  ,  V(v,w)  e  (£-£'  )U{  (v ,  v) }  ; 
(a)=post(a)-nd(a)+ ij 

02 (a) =post (a) ,  VaeVr  where  post (a )  is  the  postorder 
number  of  a. 

Then  the  general  program  scheme  finds  the  bridges  of  G(V ,E) 
in  0(n+\E\)  time  and  space. 

Proof:  It  is  easily  proved  that  for  every  aeV ,  if  v  is  a 
descendant  of  a  and  (vrw) e  (£-£' )U{ (v ,v) } 
then  w  is  a  descendant  of  a  iff 

post  (a) -ndia)  + Impost  (iA/)<post(a)  .  Moreover,  since  post(v) 
VveV  can  be  computed  in  0(n)  time  and  space [H0R079 ]  and 
nd(v)  VveV  can  be  computed  in  0(n+\E\)  time  and  space, 
T(g,01/02)=  S(gr0i ,02 )=0(n+ |E| ) .  ■ 

Note  that  this  algorithm  is  equivalent  to  that  of 
Ta  r  j  an [ TARJ7  4 ] . 

Corollary  4.1 2 :  Let  g ( v/, w )  =depth ( LCA ( v, w) )  , 

V(v,ia/)  e  {E~E'  )U{  (v,v) } ,  where  LCA(\/,iv)  is  the  lowest  common 
ancestor  of  \/  and  vj  in  T,  depth(a)  is  the  depth  of  a  in  7; 

0 i (a)=depth{a) i 

02(a)=n  (note  that  depth(v)<n  VveV),  VaeV , 
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then  the  general  program  scheme  finds  all  the  bridges  in 
0(n+\E\)  time  and  space. 

Proof:  It  is  easily  proved  that  for  every  aeV ,  if  v  is  a 
descendant  of  a  and  (v ,1a/)  e  (E~E' )U{  (v ,v) } 

then  vj  is  a  descendant  of  a  iff  depth(a)  ^depthi  LCA  ( v,  w) )  <n . 
Computing  LCA(\/,to/)  V(vrw)  e  (E~E'  )U{  (v ,  v) }  takes  0(n+\E\)  time 
and  space [HARE80 ]  and  computing  depthiv)  VveV  takes  0(n) 
time  and  space.  Hence  T (g , 0 , , 02 ) =S (g , 0 , ,02 ) =0(n+ \ E | ) .  ■ 

4.3.3  Implementation  on  the  PRAM 

Theorem  4 .  3 :  The  general  program  scheme  takes 
max(0(n/K+lg2n)  ,T(gr,0i  ,02 ) )  time  with  nK(K>1)  processors  to 
find  the  bridges  of  G(VrE)  on  the  PRAM,  where  T(g,01,02)  is 
the  time  taken  to  compute  (define)  the  functions  g,  0i  and 
02  with  nK(K>1)  processors. 

Proof:  By  Lemma  2.2,  L(v) ,  Hiv),  minia)  and  maxia)  Vv,aeV 
can  all  be  determined  in  0(n/ K+lgK)  time  with  nK(K>1) 
processors.  Step  5  clearly  takes  constant  time  and  step  1 
takes  0(n/K+lq2n)  time  with  r?K(K>1)  processors  (Theorem  2.5). 
Hence,  the  general  program  scheme  takes 

max (0(n/K+lg2n)  ,T(gf,0!  ,02 ) )  time  with  nK(K>1)  processors.  ■ 

As  with  the  sequential  machines,  optimal  parallel 
algorithms  can  be  derived  from  the  general  program  scheme  by 
using  preorder,  postorder  and  LCA  on  the  PRAM. 

Corollary  4.J4.:  By  defining  g,  0,  and  02  in  one  of  the 
following  ways,  the  general  program  scheme  runs  in 


A' 


■  .y  i,  r  p 

' 

.  '•  -  $  ■  i ' 


• 

’  .  ■ 


'I  l  f 


0(n/K+lq2n)  time  with  r?K(K>1)  processors  on  the  PRAM. 

(i)  Let  g(v ,w) =pre(iA/)  V (v,w) e (E~E' )U{ (v , v) }j 
0i (a)=pre(a) ; 

02 (a)=pre(a)+nd(a)-l ,  VaeV. 

(ii)  Let  g(v ,w)=post(w)  V(v,w) e (E-E' )U{ (v,v) } ; 

0i  (a)=post(a)-nd(a)  +  '\  i 

02(a)=post(a )  VaeV. 

(iii)  Let  g(vrw)  =depth{LCA(v  ,ia/)  )  V  (v,w)  e  (E~E'  )U{  (v ,  v) }  j 
0i (a)=depthia) ; 

02 (a)-n. 

Proof:  For  (i)  and  (ii),  pre(v) ,  postiv )  and  ndiv)  VveV,  can 
be  computed  in  0(r?/K+lgn)  time  with  r?K(K>1)  processors  (Lemma 
2.15).  For  (iii),  the  resulting  algorithm  is  Algorithm 
Bridges  presented  in  Chapter  2.  In  any  of  these  cases,  we 
have  T(g,0i  ,02  )=0(n/K+lgr?)  .  Hence  the  resulting  parallel 
algorithm  takes  0(n/K+lg2n)  time  with  r?K(K>1)  processors.  ■ 

4.4  Conclusions 

Recently,  Shiloach  and  Vishkin  designed  a  parallel 
algorithm  for  the  max-flow  problem  which  runs  in  0(n3,lgr?/p) 
time  using  p(1<p<n)  processors  on  the  WRAM[SHIL82b] .  This 
algorithm  can  at  best  achieve  the  0(n2lqn)  time  bound  with  n 
processors.  However,  they  managed  to  derive  a  sequential 
algorithm  from  it  which  has  the  Oin3)  time  complexity.  They 
claim  that  the  design  of  parallel  algorithms  could  provide 
insight  into  the  design  of  sequential  algorithms  for  the 
same  problem.  We  share  their  feeling. 
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Chapter  5 

PROBABILISTIC  TIME,  EXPECTED  TIME  AND  O(lgn)  TIME 

COMPLEXITIES 


5.-1  Introduction 

In  Chapter  3,  it  was  shown  that  all  of  our  algorithms 
run  in  0(n)  time  in  the  worst  case  on  the  MCN.  This  time 
bound  is  easily  seen  to  be  optimal  as  routing  itself  takes 
Oin)  time  in  the  worst  case  on  that  model.  It  was  also  shown 
that  all  the  algorithms  run  in  O(lgn)10  time  in  the  worst 
case  on  the  WRAM.  Although  Shiloach  and  Vishkin  conjectured 
that  it  is  difficult  to  breach  this  O(lgn)  worst  case  time 
bound  using  a  polynomial  number  of  processors  on  the 
WRAM [SHIL82] ,  no  proof  has  been  given.  As  for  the  PRAM  and 
other  more  restrictive  models,  it  was  shown  that  the 
algorithms  run  in  0(lg2n)  time  in  the  worst  case.  Although 
it  is  likely  that  this  is  a  lower  bound  for  time,  no  one  has 
yet  manage  to  prove  it.  It  is  therefore  intriguing  to  ask: 
Can  the  O(lgn)  worst  case  time  bound  be  breached  on  the  WRAM 
and  the  0(lg2n)  worst  time  bound  be  breached  on  the  PRAM? 
Recently,  Reif  showed  that  if  probability  error  in  the 
solution  is  allowed,  then  he  could  solve  some  of  the 
problems  in  0(lgn)  time  with  a  polynomial  number  of 
processors  on  the  PRAM  and  that  the  probability  error  could 
be  eliminated  by  introducing  nonun iformity [REIF8 2a 3 .  More 
recently,  Reif  and  Spirakis  showed  that  some  of  the  existing 
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graph  algorithms  do  have  O(loglgn)  expected  time  complexity 
on  the  WRAM  and  0( lgn* loglgn)  expected  time  complexity  on 
the  PRAM[REIF82b] . 

In  this  chapter,  we  shall  show  that  the  algorithms 
presented  in  the  previous  chapters  could  run  in  O(lgn)  time 
using  \E\n3lqn  processors  if  probability  error  is  allowed. 

We  shall  also  show  that  most  of  these  algorithms  have 
O(loglgn)  expected  time  complexity  on  the  WRAM  and 
0( lgn • loglgn)  expected  time  complexity  on  the  PRAM, 

SIMD-CCC,  OTN ,  OTC ,  CCC  and  PSN.  Finally,  we  shall  show  that 
the  recognition  problems  of  split  graphs  and  permutation 
graphs  do  have  0(lgn)  (deterministic)  time  algorithms.  Reif 
only  showed  that  they  have  O(lgn)  probabilistic  time 
algor i thms [REIF82a ]  and  no  other  logarithmic  time  algorithms 
were  known  before. 

5.2  Probabilistic  Time  Complexity 

Recently,  Reif  considered  the  possibility  of  breaching 
the  0(lg2n)  time  bound  for  the  connectivity  problems  and  the 
planarity  testing  problem.  He  showed  that  if  probability 
error  is  allowed,  then  the  0(lg2n)  time  bound  can  be 
breached.  His  method  is  based  on  Aleliunas,  Karp,  Lipton, 
Lovasz  and  Rackoff's  result  on  random  walks  on  connected 
undirected  graphs [ALEL79 ]  and  Lewis  and  Papadimi triou' s 
nondeterminist ic  O(lgn)  space  algorithm  for  the  UGAP  problem 
(given  a  connected  undirected  graph  G(V ,E)  and  u,veV ,  does 
there  exist  a  path  from  u  to  v  in  G2 ) [LEWI 82 ] .  Aleliunas  et 
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al.  showed  the  following: 

Given  any  connected  undirected  graph  G(V ,E) ,  let  r 

be  a  random  walk  in  G  starting  from  any  vertex  veV.  r 

is  constructed  by  repeated  extension,  randomly  choosing 

an  edge  which  is  connected  to  the  current  front  end  of 

r  and  adding  it  to  r.  If  r  is  of  length  2 | E | ( | V | - 1 ) , 

then  Prob(r  visits  all  vertices  in  G)^ 1/2. 

Using  the  result  of  Aleliunes  et  al.,  Reif  devised  a 

probabilistic  search  technique  to  implement  Lewis  and 

Papadimi tr iou ' s  UGAP  algorithm  in  G(lgn)  space.11 

Specifically,  he  showed  that  given  any  probability  error  e, 

0<e< 1 ,  the  UGAP  problem  can  be  solved  in  O(lgn)  space  and 

n0( 1 ’  time  within  error  e.  By  solving  a  problem  within  error 

e,  he  means  that  given  any  problem  instance  co  of  UGAP,  if 

the  answer  to  u>  is  yes,  then  the  probability  that  the 

algorithm  produces  the  answer  yes  is  greater  than  or  equal 

to  1-e.  If  the  answer  to  u  is  no,  then  the  probability  that 

the  algorithm  produces  the  answer  yes  is  less  than  w. 

Observing  that  deterministic  PRAM's  can  accept  within 

polynomial  time  exactly  the  sets  that  deterministic  Turing 

machines  can  accept  within  polynomial  space [GOLD78 ,WYLL79 ] , 

Reif  proceeded  to  show  that  given  any  probability  error  e, 

0<e < 1 ,  the  UGAP  problem  can  be  solved  within  error  e  in 

O(lqn)  time  with  n0(1)  processors  on  the  PRAM.  Using  this 

UGAP  algorithm,  Reif  managed  to  implement  Kruskal’s  greedy 

11  Reif's  result  is  more  formal  and  general.  We  have 
tailored  his  result  here  to  suit  our  needs.  Readers  who  are 
interested  in  his  work  are  encouraged  to  consulted 
[REIF82a ] . 
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algorithm  for  the  minimum  spanning  forest 
problem[HQW079 ,pp. 179- 1 83 ]  within  error  e  in  Oilgn)  time 
with  | E | • p  processors  on  the  PRAM  (p  is  the  number  of 
processors  used  by  the  probabilistic  Oilgn)  time  UGAP 
algorithm).  In  [REIF82c],  Reif  claimed  that  p=n3lgn.  As  a 
result,  we  have: 

Lemma  5.1 : [ RE IF82a ]  For  any  probability  error  e,  0<e<1, 
there  is  a  parallel  algorithm  which  finds  a  minimum  spanning 
forest  for  an  undirected  graph  within  error  e  in  Oilgn)  time 
with  \E\n3lgn  processors  on  the  PRAM. 

Using  this  result,  it  is  easily  shown  that: 

Lemma  5.2:  For  any  probability  error  e,  0<e<1,  there  exists 
an  Oilgn)  time  probabilistic  parallel  algorithm  for  finding 
an  inverted  spanning  forest  using  \E\n3lgn  processors  on  the 
PRAM. 

Proof:  First,  find  a  minimum  spanning  forest  T  for  the 
undirected  graph  in  Oilgn)  time  within  probability  error  e, 
0<e< 1 ,  using  \E\n3lgn  processors [REIF82 ] .  Then  convert  T 
into  an  inverted  spanning  forest  using  Lemma  3.17.  ■ 

Theorem  5.3:  For  any  probability  error  e,  0<e<1,  The  class 
of  algorithms  described  in  Chapter  2  could  run  in  Oilgn) 
time  within  error  e  using  \E\n3lgn  processors  on  the  PRAM. 
Proof:  First  note  that  by  constructing  an  inverted  spanning 
forest  for  an  undirected  graph,  we  can  determine  the 
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connected  components  of  the  undirected  graph  in  Oilqn)  time 
as  follows:  for  every  vertex  \/,  associated  1/  with  the  root 
of  the  tree  in  which  \/  resides,  then  u,  v  belong  to  the  same 
connected  component  iff  u  and  \/  are  associated  with  the  same 
root.  These  roots  can  be  identified  easily  if  we  use  the 
array  F+ .  The  whole  process  clearly  takes  no  more  than 
Oilqn)  time  if  nrn/lgn-|  processors  are  available  (Theorem 
2.3).  As  a  result,  by  using  Lemma  5.2,  we  can  construct  a 
directed  spanning  forest  or  determine  the  connected 
components  of  an  undirected  graph  in  CKlgn)  time  with 
| E | n3  Ign  processors  within  error  e.  Furthermore,  it  is 
easily  comfirmed  that  all  the  other  steps  in  the  algorithms 
do  not  take  more  than  Oilqn)  time  with  nrn/lgn-|  processors. 
The  theorem  thus  follows.  ■ 

In  addition  to  the  result  on  random  walks  for  connected 
undirected  graphs,  Aleliunas,  Karp,  Lipton,  Lovasz  and 
Rackoff  also  gave  an  affirmative  answer  to  a  question  from 
Cook  concerning  the  existence  of  short  n~uni versal 
sequences.  An  n-universal  sequence  is  defined  as  follows: 
"Let  G  be  a  connected  undirected  regular  graph  of  degree  d. 
At  each  vertex  v ,  let  the  edges  incident  with  v  be  given  the 
distinct  labels  0  ,  1_,  2 ,  .  .  .  ,d~  1  .  A  sequence  o  in 
{ 0  ,  1  ,  2  ,  .  . .  rd~  1 }  *  is  said  to  traverse  G  from  \/  if  starting  at 
v  and  following  the  sequence  of  edge  labels  o,  one  visits 
all  the  vertices  of  G.  o  is  called  an  n-universal  sequence 
if  it  traverses  every  n-vertex  regular  graph  G  with  degree  d 
starting  from  any  vertex  v."  Aleliunas  et  al.  showed  that 


.  • 


/ 

■ 

I 

; 

. 

t 

)  - 

:  1  . 

■ 


124 


there  exists  an  n-universal  sequence  of  length  0in3lqn) . 

By  replacing  the  probabilistic  choice  in  his 
probabilistic  search  technique  with  an  n~ universal  sequence, 
Reif  showed  that  t.he  probabilistic  error  in  his  algorithm 
can  be  eliminated.  However,  as  each  /?-uni  versal  sequence  is 
good  for  only  a  particular  n,  the  resulting  algorithm 
becomes  nonuniform  in  the  sense  that  there  is  a  different 
program  for  each  different  n.  Consequently,  we  have: 

Corollary  5.4:  The  set  of  graph  theoretic  problems 
investigated  in  Chapter  2  can  be  solved  in  O(lgn)  time  using 
|E|r?3lgn  processors  with  a  nonuniform  algorithm  on  t'he  PRAM. 

5.3  Expected  Time  Complexity 

More  recently,  Reif  and  Spirakis  showed  that  given  a 
random  (directed  or  undirected)  graph,  the  diameter  d  of  G 
has  an  expected  length  O(lgn) [REIF82b] .  Based  on  this 
result,  they  showed  that  some  existing  parallel  graph 
algorithms,  particularly,  those  for  the  graph-connectivity 
and  minimum  spanning  forest,  have  an  0{  lgr?  •  lglgn)  expected 
time  complexity  on  the  PRAM  and  an  O(lglgn)  expected  time 
complexity  on  the  WRAM.  Combining  their  results  on  the 
average  length  of  diameters  with  ours  stated  in  Chapter  3, 
we  immediately  have: 

Lemma  5.5:  With  the  exception  of  Algorithm  Brconnect  and 
Algorithm  Biconnect,  all  the  algorithms  presented  in  Chapter 
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3  have  an  0{  t  (n)  •  lglgr?)  expected  time  bound  with  H(n) 
hardware  resources  on  the  MMM. 

Proof:  Since  d=Oilgn )  on  the  average [REIF82b] ,  therefore 
L=0(lglgn).  ■ 

Unfortunately,  Reif’s  result  on  the  expected  length  of 
diameters  cannot  be  applied  to  Algorithm  Brconnect  and 
Algorithm  Biconnect.  This  is  because  the  structures  of  the 
graphs  G0 (V 0 ,E0UA ^  )  and  G” (£',£")  depend  on  the  given  graph 
G(V ,E)  and  are  therefore  not  random  graphs. 

5.4  O(lgn)  Time  Algorithms  for  Split  Graphs  and  Permutation 
Graphs 

Split  graphs  and  permutation  graphs  arise  in  many 
contexts  and  have  received  considerable  attention  in  the 
past  decade.  The  former  belongs  to  the  class  of  chordal 
graphs  (triangulated  graphs)  which  have  important 
applications  in  Guassian  elimination,  genetic  research,  etc. 
The  latter  were  shown  to  be  useful  in  modelling  and  system 
programming  like  memory  reallocation.  The  previouly  known 
fastest  sequential  algorithm  for  identifying  the  split 
graphs  takes  0(n+\E\)  t ime [RQSE76 ,F0LD77 ]  while  that  for 
identifying  permutation  graphs  takes  0(n3)  t ime [EVEN72 ] .  No 
parallel  algorithms  exist  for  problems  of  this  class  except 
Reif’s  G(lg/l)  time  probabilistic  parallel  algorithm  and 
0(lgn)  nonuniform  parallel  algor ithm[REIF82a ]  for  the  PRAM. 
However,  his  algorithm  for  split  graphs  does  not  generate  a 
split  if  the  result  of  the  identification  is  positive.  In 
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this  chapter,  we  show  that  there  are  indeed  O(lgn) 
(deterministic)  time  algorithms  for  the  recognition  problems 
of  these  two  classes  of  graphs  on  the  PRAM.  Furthermore,  the 
algorithm  for  split  graphs  uses  j:n2/lqn-\  processors. 
Unfortunately,  since  the  splitting  property  of  a  graph  is 
not  monotone,  we  do  not  know  whether  this  processor  bound  is 
optimal.  Finally,  we  show  that  these  algorithms  can  be 
implemented  on  the  MMM  taking  0(t(n ))  time  and  H(n)  hardware 
resources  and  that  the  algorithms  can  be  converted  into 
0(n+\E\)  time  and  space  optimal  sequential  algorithms. 

5.5  Identification  of  Split  Graphs 

Lemma  5.6:  G(VrE)  is  a  split  graph  iff  G  has  a  split 
G^{V^rE^)l  G2(V2 ,E2)  such  that  G i  is  independent  and  G2  is  a 
clique . 

Proof:  The  "if”  part  is  obvious. 

The  "only  if"  part:  If  G  is  a  split  graph,  then  G  has  a 
split  G i,  G2  where  G2  is  a  complete  subgraph.  If  G2  is  not  a 
clique,  then  there  exists  a  vertex  veV :  such  that  ( (\z}xl/2 ) 
is  a  subset  of  E.  Moreover,  there  does  not  exist  another 
UeV i  for  which  ( {u}x(V2U{v} ) )  is  a  subset  of  E  for  otherwise 
Gi  cannot  be  independent.  Thus,  the  subgraphs  G\ (V i-{v] ,E \) , 
G2(V2U{v] ,E2U({v}xV2) )  is  a  split  of  G  in  which  G2  is  a 
clique.  ■ 

Due  to  Lemma  5.6,  we  may,  without  loss  of  generality, 
assume  that  whenever  we  speak  of  a  split  G i,  G2  of  a  split 
graph,  G^  is  independent  while  G2  is  a  clique.  We  will  adopt 


127 


this  assumption  in  subsequent  discussion. 

Apparently,  if  G  is  a  split  graph,  then  just  by  finding 
a  clique  G2  in  it,  one  should  be  able  to  conclude  that  G  is 
a  split  graph  as  the  remaining  part  G~G2  should  be 
independent.  Unfortunately,  this  is  not  the  case  as  is 
depicted  in  Figure  5.1. 

In  Figure  5.1,  the  graph  G  has  three  cliques.  Only  the 
one  determined  by  the  vertex  set  { a,b,d }  leads  us  to  the 
decision  that  G  is  a  split  graph.  It  is  therefore  important 
to  be  able  to  distinguish  between  those  cliques  which  would 
lead  us  to  the  right  decision  that  G  is  a  split  graph  (if  G 
is  indeed  a  split  graph)  and  those  which  would  not.  The 
following  lemma  sheds  some  light  on  this  matter. 

Lemma  5.7:  If  G(V ,£)  is  a  split  graph  and  Gi(\/i,£i), 
G2{V2rE2)  form  a  split, 

then  (i)  deg (v) ^ \ V  2  \ ~ 1  VveV i ; 
and  (ii)  deg  (m/)  ^  |  V  2  |  “  1  Y /weV 2 , 
where  deg(vO  stands  for  the  degree  of  \/. 

Proof:  (i)  Let  veV i •  Then  (v,u)^E  VueV y  because  £ ,=0. 
Therefore,  deg ( v) < | V  2  |  .  But  deg(v0=|y2|  implies  that 
G2(V 2 ,E2)  is  not  a  clique.  Hence,  deg (v) ^ | V 2 \ -  1 . 

(ii)  Immediate  from  the  definition  of  complete  graphs.  ■ 

Corollary  5.8:  Let  G(V ,£)  be  a  split  graph.  For  any  ueV i, 
VeV2r  deqiu)^deq(v) . 

Proof:  Immediate  from  Lemma  5.7.  ■ 
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{a,b,c} , [a,b,d] ,  and  {b,d,e}  are  cliques. 

Only  { {a, b, d] , {c,e) }  induces  a  split, 

{ ia,b,c} , [d,e] }  and  { {b,dfe} , {a,c} }  don't. 


Figure  5 . 1 
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Corollary  5.8  indicates  that  if  we  sort  the  vertex  set 
V  by  degree  of  vertex  in  descending  order,  then  those 
vertices  in  \/ 2  will  precede  those  in  V  y  in  the  sorted 
sequence  with  an  intermixed  region  inbetween  which  contains 
the  set  of  all  vertices  in  Vi  and  V2  having  the  same  degree 
\V2\~'\.  Therefore  to  identify  the  vertex  set  of  the  clique 
G2 ,  we  have  to  be  able  to  identify  those  vertices  of  V 2  in 
the  intermixed  region  of  the  sorted  sequence.  Fortunately, 
this  is  not  a  difficult  task  due  to  the  following  lemma. 

Lemma  5.9:  Let  Gi(V,rE:)r  G2{V2,E2)  be  a  split  of  a  split 

graph  G(V,E).  Let  C t  =  {ueV ,  | deg (u)  =  | V2 \ -  1 }  and 

C2~{veV2  |deg(\/)=  \V  2  \  -  1 } ;  then  for  any  ueC  ^  and  any  veC2, 

G\(V ,-{u}U{v]  tE,)  and 

G2 (V 2- {v}U{u} rE 2U ( {u}x(V 2~ {v} ) ) - {V 2x{v] ) )  is  also  a  split  of 

G. 

Proof:  Since  deg ( v) = | V 2  | -  1  and  G2  is  a  clique,  {u,v)^E. 
Furthermore,  as  deg (u) = | V2  \ ~ 1 ,  and  G i  is  independent, 

G2 (V 2-{v}U{u} ,E2U ( {u}x(V 2-{v] ) )- (V 2x{v} ) )  must  be  a  clique. 
Since  deg  (v)  =  |  V  2  \  -  1  and  veV2,  •'*{itf,v)/E,  VweV  ^  .  This  implies 
that  G\ (V i {u}U{v} ,Ei )  is  independent.  Hence,  G\ ,  G\  is  a 
split  of  G.  ■ 

The  above  lemma  implies  that  if  we  sort  the  vertex  set 
of  a  split  graph  G  by  degree  of  vertex  in  descending  order, 
then  the  first  W  vertices  and  the  remaining  n~k  vertices  in 
the  sorted  sequence  always  constitute  the  vertex  sets  of  a 
split  of  G  where  k=max  {  / 1  deg  {v  ■,  )  ^ /- 1 } .  vt  is  the  /  th  vertex 
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in  the  sorted  sequence. 

Hence  we  have  the  following  characterization  theorem 
for  split  graphs. 

Theorem  5.10:  Let  G(V ,E)  be  an  undirected  graph  and  V\,  v2r 
v  2 1  ...  ,  vn  be  the  sequence  of  vertices  of  G  sorted  by 
degree  of  vertex  in  descending  order. 

G  is  a  split  graph  iff  { v/ 1  ,  v2,  v3,  ...  ,  vk]  induce  a 
clique  in  G  while  {vk+i/  vk+2l  ...  ,  un }  induce  an  indendent 
subgraph  of  G  where  k  is  defined  as  above. 

Proof:  By  Lemmas  5.7,  5.9  and  the  definition  of  split  graph. 


Algorithm  :  Split 

( *  This  algorithm  examines  if  an  undirected  graph  is  a  split 
graph  and  produces  a  split  if  it  is. 
deg(\/)  stands  for  degree  of  v  *) 


!  .  Compute  deg(\/)  VveV.  Sort  V  by  degree  of  vertex  in 
descending  order. 

2.  Find  k  such  that  k=max{  /  deg  (v  ■,  )  ^  /- 1  and  deg  (v  j  +  i )  ^  /-  1  jj 

3.  Let  i  =  {\/k  +  1  r^k  +  2(»  ••  V  2  —  '{.V  ]  r  V  2  r  r  V  k  • 

Check  if  V 2  form  a  clique  in  G.  If  not,  then  G  is  not  a 
split  graph. 

4.  Check  if  V ^  induce  an  independent  set  in  G.  If  no,  then  G 
is  not  a  split  graph. 

5.  Declare  G  is  a  split  graph  and  G^(V^rE^)r  G2(V2lE2)  is  a 
split.  ■ 


Theorem  5.1-1:  Algorithm  Split  correctly  identifies  a  split 
graph. 

Proof:  Given  any  graph  G(V ,E) ,  If  G  is  a  split  graph,  then 
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by  Theorem  5.10,  Algorithm  Split  correctly  identify  G  as  a 
split  graph.  If  G  is  not  a  split  graph,  then  either  Step  3 
detects  that  G2(V2rE 2)  is  not  a  clique  or  Step  4  detects 
that  G^(V^,E^)  is  not  independent.  In  either  case,  G  is 
identified  not  to  be  a  split  graph.  ■ 

Theorem  5. .12:  Algorithm  Split  runs  in  0(n/K+lgn)  time  with 
r?K(K>1gr?)  processors  on  the  PRAM. 

Proof:  Given  an  adjacency  matrix  M  of  G,  Step  1  takes 
0(n/K+lgn)  time  with  nK(K>0)  processors  to  compute  deg(v) 
VveV  (Lemma  2.2)  and  O(lgn)  time  with  nlgn  processors  to 
sort  the  vertices  by  degree  of  vertex [ BORQ82 3 .  Step  2  takes 
0(1)  time  with  n  processors.  Step  3  takes  0( (k~ 1 )/K+lgn) 
time  with  r?K(K>0)  processors.  Step  4  takes  0(  (n~k- 1  ) /K+lgn) 
time  with  nK (K>0 )  processors.  Hence,  Algorithm  Split  runs  in 
0(r?/K+lgn)  time  with  r?K  (K>lgn)  processors.  ■ 

Corollary  5.-13:  Identifying  a  split  graph  can  be  done  in 
0( Igr?)  time  with  nxn/lgn-|  processors  for  xr?/lgr?i  >lgn. 

Now,  we  shall  implement  Algorithm  Split  on  the  MAW.  To 
ease  the  task  of  explanation,  we  shall  assume  that  the 
degrees  of  the  vertices  of  G(V ,E)  are  all  distinct. 
Generalizing  our  result  to  arbitrary  case  is 
straightforward . 


Theorem  5.-14:  Algorithm  Split  runs  in  O(tin))  time  with  H(n) 
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hardware  resources  on  the  MMM . 

Proof:  Let  M  be  the  adjacency  matrix.  In  step  1,  deg(\/)  is 
computed  by: 

degiUrV]  :=2kn  =  , M[u,k]; 

(clearly  deg (u) =deg[u ru]=deg[u r v]  Vu,ve V) . 

Order  the  vertices  by  degree  of  vertex  as  follows: 

(Broadcast  columnwise:)  deg'  [iv,  v] :  =deg[v ,v]  Vv,weV; 
rank[u,v ] :=I" = i ( deg [ u , k ] <deg ' [u,k] ); 

(  note  that  rank (u) =rank[u ,u]=rank[u ,v] ,  and  rank(u)  is 
the  position  of  u  in  the  sorted  sequence  ). 

In  step  2,  k  is  determined  as  follows: 

(  Erase  the  rank  of  those  u  whose  rank  does  not  satisfy 
the  condition  :  deg (u) ^rank (u) -  1 . ) 
rank[u,u] :  =  if  idegiu ,u]^rank[u ru] - 1 ) 
then  rank[u,u 3 
else  0 ; 

great  [u ru]  :  =  “■  (\/k  (rankiu ,k]<ranklk,u] ) )  j 
(  note  that  great [u,u] = 1  iff  U  is  the  vertex  whose  rank 
is  the  k  ) . 

In  step  3,  the  set  V  is  partitioned  as  follows: 

(Broadcast  columnwise:)  great ' [w ,v] :=great[v rv 3j 
(Broadcast  columnwise:)  rank' [w,v] : =rank[v,v] j 
Spl it [u,v] : = ( ranklu ,v]<rank' [u ,v]) /\great' [u,v] j 
(Broadcast  the  nonzero  Split  rowwise.  Note  that  there 
is  at  most  one  nonzero  Split  value  on  each  rowwise): 

Spl itlUjU] :=Lk  =  vSp7 it[u,k]i 

as  a  result,  V,={u\SpJ itlUjU]=0}  and 
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V2={u\Spl it[u,u]= 1 } . 

Steps  4  and  5  can  be  combined  and  tested  together  as  below: 
(Broadcast  rowwise:)  Spl it[v,w] :=Spl it [v,v] : 

(Broadcast  columnwise:)  Split'[w,v]:=Split[v,v]} 
Flag[u,v]  :  =  ( -‘Spl  it[u ,v]/\Spl  it' [u,v]) 

V  (Spl it[u,v]/\-Sp1 it [u,v] ) 

V  ( Spl it[u rv]/\Spl it' [u ,v]/\M[u ,v] ) 

V  (-1 Spl  it[u ,v]/\~‘Spl  it[u,v]/\-'M[u,v] ) 

The  above  statement  should  be  interpreted  as  Flag[u, u] = 1  iff 
UeV i  and  veV2  or  ueV2  and  veV ,  or  ( u,v)eE  if  u,VeV2  or 
( u,v)FE  if  u,veV i.  Hence,  G  is  a  split  graph  iff  Flag[u,v]  =  '\ 
Vu rveVxV .  Therefore,  after  computing: 

Flag[u,v] : =/\k (FI ag[u ,k]/\F1 ag[k,v] ) 
twice,  FI ag[} , iff  G  is  a  split  graph.  ■ 

Theorem  5. .15:  Algorithm  Split  runs  on  a  sequential  computer 
in  0(n+ |E| )  time  and  space. 

Proof:  In  Step  1,  we  use  bucket  sort[AHQ74,  Section  3.2]  to 
sort  the  vertices  in  V .  This  takes  linear  time  and  space. 
Step  2  takes  0(n)  time.  Steps  3  and  4  takes  0(n+\E\)  time. 
Moreover,  Oin+\E\)  space  is  sufficient  if  we  use  an 
adjacency  list  to  represent  the  graph. 

From  Lemmas  3.1,  3^2  and  3.3,  we  have  the  indicated 
time  and  hardware  resource  complexities.  ■ 
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5.6  Identification  of  Permutation  Graphs 

Our  algorithm  is  based  on  the  following 
character izat ion  theorem  due  to  Even,  Pnueli  and  Lempel. 

Theorem  5. 16:  An  undirected  graph  G(VrE)  is  a  permutation 
graph  iff  both  G+(V,E+)  and  G~ (V ,E~ )  are  transitive,  where 
G  +  and  G"  are  directed  graphs  induced  from  G  such  that 
E+={</ ,j>\ i<j  and  ( i,j)eE }  and  E ~ = {< 7 , j> \ i>j  and 
(  i  ,j)  eVxV-  (EU{  iv  ,v)  Iv'eV})}. 

Proof:  [ EVEN7  2 ] .  ■ 

Even,  Pnueli  and  Lempel  also  showed  how  to  determine 
the  permutation  of  G  if  G  is  a  permutation  graph. 
Specifically,  they  formed  G°=G+UG~  and  showed  that  G°  is  a 
cycle-free  directed  graph  whose  underlying  graph  is 
complete.  Therefore,  G°  must  have  a  sink  s^  ,  namely,  a 
vertex  which  has  no  outgoing  edge.  They  removed  s i  from  G° 
and  showed  that  the  resulting  graph  remains  cycle-free  and 
its  underlying  graph  is  also  complete,.  A  sink  s2  in  this 
graph  therefore  exists.  By  repeating  this  process,  they 
ended  up  with  a  sequence  of  sinks  Si ,S2 , .4 . ,Sn .  They  showed 
that  the  permutation  P  such  that  P(/)=si/  1 <i^n  is  a 
permutation  of  G.  To  determine  P  efficiently  in  parallel,  we 
restate  P  as: 

p={</,p(/)> \ out -degree (p( /))= /-I  in  G° ,  1 </<n}. 

Based  on  there  results,  we  immediately  have: 

Theorem  5._17:  Identifying  a  permutation  graph  and 
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determining  its  permutation  can  be  done  in  0(n/K+lgn)  time 
with  n2K(K> 1 )  processors  on  the  PRAM. 

Proof:  Constructing  G+  and  G~  from  G  takes  0(n/K+lgK)  time 
with  r?K(K>1)  processors  (Lemma  2.2).  Testing  if  both  G+  and 
G"  are  transitive  takes  0(n/K+lgn)  time  with  n2K(K>1) 
processors (Lemma  2.2).  Moreover,  if  G  is  a  permutation 
graph,  then  determining  the  permutation  P  of  G  takes 
Oin/ K+lgK)  time  with  r?K(K>1)  processors  as  this  is  the  time 
and  processor  complexities  one  needs  to  compute 
out-degree (v) ,  for  all  v  in  G°.  ■ 

Corollary  5.18:  Identifying  a  permutation  graph  and 
determining  its  permutation  can  be  done  in  O(lqn)  time  with 
n3/lgn  processors  on  the  PRAM. 

Theorem  5.19:  Identifying  a  permutation  graph  and 
determining  its  permutation  can  be  done  in  Oitin))  time  with 
H in)  hardware  resources  on  the  MMM . 

Proof:  Trivial.  ■ 

5.2  Conclusions 

Breaching  the  0(lg2r?)  time  bound  for  graph  theoretic 
problems  on  the  PRAM  is  one  of  the  main  concern  in  algorithm 
design.  It  seems  to  be  a  difficult  task,  and  in  fact  some 
have  conjectured  that  it  is  impossible [KUCE82 ] , 

In  this  chapter,  although  we  do  not  manage  to  develop  a 
general  technique  to  surpass  the  0(lg2n)  time  bound,  we  do 
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show  that  most  of  our  algorithms  have  probabilistic  time 
complexity  and  expected  time  bound  below  0(lg2n)  and  that 
the  recognition  problems  for  split  graphs  and  permutation 
graphs  have  O(lgn)  optimal  time  complexity.  (Note  that 
although  the  lowest  common  ancestor  algorithm  has  O(lgn) 
time  complexity,  the  graphs  it  deals  with  are  directed  trees 
which  is  only  a  subset  of  all  graphs).  We  feel  that  our 
success  in  finding  0(lgr?)  time  algorithms  for  split  graphs 
and  permutation  graphs  is  due  to  the  particular 
characteristic  theorems  for  these  graphs.  These 
characteristic  theorems  allow  us  to  process  the  graph 
locally  at  each  vertex  without  having  to  perform  a  graph 
search  to  collect  global  information.  This  is  reflected  in 
the  algorithms  by  the  fact  that  no  construction  of  an 
inverted  spanning  forest  is  necessary.  The  process  of 
collecting  global  information  is  a  time-consuming  process 
and  is  the  main  cause  of  the  0(lg2r?)  time  complexity.  As  a 
result,  we  believe  that  one  way  to  breach  the  0(lg2n)  time 
bound  for  graph  theoretic  problems  is  to  develop 
characteristic  theorems  which  allow  us  to  get  global 
information  without  performing  a  graph  search.  However, 
discovering  such  characteristic  theorems  seems  to  be  very 
difficult  in  generals 

Finally,  as  with  Chapter  2,  we  remark  that  the  lower 
bound  for  the  number  of  processors  used  in  identifying  split 
graphs  and  permutation  graphs  can  be  reduced  to  nK(K>0) 
rather  than  r?K(K>1). 
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Chapter  6 
CONCLUSIONS 

We  have  presented  algorithms  for  the  class  of  graph 
theoretic  problems  listed  in  the  introduction  to  this 
thesis.  These  algorithms  achieve  the  conjectured  lower  bound 
for  the  worst  case  time  complexity  on  many  of  the  existing 
models.  The  number  of  processors  they  require  is  optimal  in 
most  cases  for  the  PRAM.  Furthermore,  they  have  good 
expected  time  complexities  and  have  O(lgn)  probabilistic 
time  on  the  PRAM.  In  most  cases,  the  results  obtained 
provide  new  upper  bounds  for  the  problems.  Hence,  we  believe 
that  the  goals  of  ’portability'  and  ’efficiency’  have  both 
been  achieved. 

The  concept  of  'portability'  is  not  new  in  the 
discipline  of  computer  science,  and  is  certainly  an 
important  one.  Surprisingly,  such  an  important  concept  has 
not  received  much  attention  in  the  design  of  efficient 
algorithms  for  parallel  computer  models.  Although  some 
portable  algorithms  have  appeared  in  the  literature,  their 
portability  was  made  possible  by  the  simplicity  of  the 
problems,  and  not  the  design  of  the  algorithms.  The  first 
work  (possibly  the  only  work)  emphasizing  the  concept  of 
portable  algorithms  was  an  unpublished  manuscript  of  Miller 
and  Stout  for  the  graph-connectivity  problem[MILL82 ] . 
However,  the  class  of  computers  on  which  their  algorithm 
works  efficiently  is  relatively  small.  In  our  opinion,  the 
MMM  proposed  in  this  thesis  serves  as  a  good  model  for 
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designing  efficient  portable  algorithms.  There  are  several 
reasons  for  this:  firstly,  it  has  a  great  deal  of 
generality.  In  this  thesis  it  has  been  shown  that  it 
includes  most  of  the  well-known  existing  models.  Therefore, 
any  algorithm  which  runs  on  the  MMM  will  automatically  run 
on  all  those  models.  Secondly,  it  has  been  shown  that  many 
operations  can  be  carried  out  in  Oitin ))  time  with  H in) 
hardware  resources  on  the  MMM.  These  include  the  prototype 
operations  like  sorting,  labelling  the  vertices,  computing 
partial  sums,  finding  the  maximum  and  minimum,  etc.  We  have 
seen  that  tin)  is  also  the  lower  time  bound  for  graph 
theoretic  problems  on  many  of  the  existing  computer  models. 
As  a  result,  we  may  employ  any  of  these  prototype  operations 
freely  in  designing  graph  algorithms  on  the  MMM  as  the  time 
they  consume  is  always  within  the  optimal  time  bound.  In 
other  words,  we  stand  a  good  chance  of  getting  optimal  graph 
algorithms  on  the  MMM.  Thirdly,  matrix  multiplication  is  a 
basic  yet  important  operation.  Its  central  role  in  many 
scientific  applications  is  widely  recognized.  Any  computer 
model  whose  design  is  unsuitable  for  matrix  multiplication 
will  be  of  limited  usefulness.  For  this  reason,  it  may  be 
justified  to  say  that  any  general  purpose  computer  model  is 
an  MMM.  Finally,  due  to  the  uniform  nature  of  ordinary 
matrix  multiplication,  the  MMM  should  be  easily  constructed 
(note  that  the  processors  need  not  have  expensive 
multiplication  capability).  The  number  of  processors 
required  is  also  reasonable,  since  otherwise  it  may  not  be 
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possible  to  realize  matrix  multiplication  efficiently  on 
parallel  computer  models.  r 

We  feel  that  the  directed  spanning  forest  problem 
deserves  more  attention.  The  importance  of  this  problem 
seems  to  have  been  overlooked  after  the  search  for  efficient 
(sublinear  time)  parallel  depth-first  search  algorithm  was 
unsuccessf ul .  In  fact,  the  importance  of  this  problem  is 
easy  to  appreciate  as  the  directed  spanning  forest  provides 
a  framework  upon  which  global  information  can  be  organized 
and  transferred  from  vertex  to  vertex  within  the  graph.  The 
success  of  the  depth-first  search  technique  (which  creates  a 
directed  spanning  forest)  in  designing  optimal  algorithms 
for  the  sequential  RAM  gives  strong  support  to  this  view. 

The  fact  that  the  directed  spanning  forest  for  the  PRAM  and 
the  directed  BFS  spanning  forest  for  the  MMM  serve  as  the 
backbone  of  all  of  our  algorithms  provides  further  evidence. 
Moreover,  in  the  course  of  developing  our  algorithms,  we 
observed  that  the  execution  times  of  our  algorithms  are 
dominated  by  the  directed  spanning  forest  algorithm.  This  is 
because  with  the  exception  of  the  steps  for  finding  a 
directed  spanning  forest  and  for  determining  the  connected 
components  of  an  undirected  graph  G,  we  have  ensured  that 
all  the  steps  in  our  algorithms  run  in  optimal  time.  But  we 
have  shown  in  Theorem  5.3  that  the  connected  component 
problem  can  be  reduced  to  a  directed  spanning  forest 
problem.  Therefore  the  optimality  of  our  algorithms  depends 
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on  our  ability  in  developing  an  optimal  directed  spanning 
forest  algorithm.  In  other  words,  we  may  reduce  the  problem 
of  finding  an  optimal  time  algorithm  for  any  of  the  graph 
theoretic  problems  investigated  in  this  thesis  to  that  of 
finding  a  directed  spanning  forest  of  an  undirected  graph. 
This  may  explain  why,  in  Chapter  5  whenever  there  is  an 
improvement  in  the  directed  spanning  forest  algorithm,  there 
is  automatically  an  improvement  in  all  the  other  algorithms. 

In  view  of  the  importance  of  the  directed  spanning 
forest  problem,  we  summarize  our  results  on  this  problem  and 
propose  several  related  open  problems  below. 

1.  A  directed  spanning  forest  can  be  found  in  0(lg2n)  time 

with  r?j-n/lg2r?-|  processors.  This  result  is  optimal  for 
dense  graphs  with  respect  to  the  time-processor 
product.  (Chapter  2) 

2.  A  directed  spanning  forest  can  be  found  in  O(lgn) 

probabilistic  time  with  |E|n3lgn  processors  on  the 
PRAM.  (Chapter  5) 

3.  A  directed  BFS  spanning  forest  can  be  found  in 

0( lg/V Iglgn)  expected  time  on  the  PRAM  and  in  0(lglgn) 
expected  time  on  the  WRAM  with  n3  processors.  (Chapter 
3) 

4.  A  directed  BFS  spanning  forest  can  be  found  in  0(lgn • lgaO 

time  with  H (n)  hardware  resources  on  the  MMM  where  d  is 
the  diameter  of  the  given  graph. 
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Besides  the  direct  way  of  finding  a  directed  spanning 
forest,  two  alternative  indirect  ways  have  been  used  in  this 
thesis.  The  first  is  to  find  a  minimum  spanning  forest  for 
the  given  graph  (note  that  the  minimum  spanning  forest  does 
not  convey  global  information  efficiently)  and  then  convert 
it  into  a  directed  forest  by  constructing  a  directed  BFS 
spanning  forest  in  it.  This  technique  was  described  in  Lemma 
3.17  and  was  employed  in  Chapter  5.  The  second  way  is  to  use 
the  all-pair  shortest  path  algorithm.  This  technique  has 
been  used  in  Chapter  3  to  produce  a  directed  BFS  spanning 
forest . 

The  following  are  open  problems: 

1.  Can  a  directed  spanning  forest  be  found  in  Oilgn)  time 

with  nxn/lgr?-|  or  even  j\E  \  /lgn-\  processors  on  the  PRAM? 
Note  that  solving  this  problem  implies  solving  all  the 
graph  theoretic  problems  investigated  in  this  thesis  in 
optimal  time  using  an  optimal  number  of  processors  on 
the  PRAM. 

2.  Can  the  number  of  processors  used  by  the  Oilgn)  time 

probabilistic  algorithms  be  reduced? 

3.  Can  the  expected  time  complexities  be  improved  or  the 

number  of  processors  used  be  reduced? 

4.  Can  the  time  complexity  be  improved  on  the  MMM? 

Since  the  majority  of  the  problems  investigated  in  this 
thesis  are  related  to  the  connectivity  property  of  graphs. 
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it  is  natural  to  ask  if  the  ^-connect ivity (K^3 )  problems  can 
be  solved  efficiently.  The  best  previously  known  parallel 
algorithm  for  testing  if  a  graph  is  k~connected(k^3 )  on  the 
PRAM  takes  £(lg2nlgk)  time  with  0(nk+1)  processors  [GOLD7.7  ] 
or  O(lgn)  probabilistic  time  with  n0(1)  processors [REIF82a ] . 
No  algorithms  were  known  for  other  parallel  computer  models. 
Using  the  results  obtained  in  the  previous  chapters  for  the 
biconnectivity  problem  and  the  following  lemma,  it  is  easily 
shown  that  testing  if  an  undirected  graph  is 
k-connected (k^3 )  can  be  done  in  0(lg2n)  time  with 
nk  ‘  1  rr?/lg 2r?-|  processors  or  in  Oilqn)  probabilistic  time  with 
\E\nk+'lqn  processors  on  the  PRAM.  The  worst  case  and  the 
expected  time  complexities  for  the  MMM  can  be  similarly 
derived . 

Lemma  6 .  Jl :  An  undirected  graph  G  is  k-connected (k^3 )  iff 
V{vy  ,v2 , . . . ,vy . 2 ) eVk ~ 2 ,  G[v y  ,v2  , . . . ,vk - 2 3  i s  biconnected, 
where  G[Vi , v2 , . . . , vk -  2 ]  is  obtained  from  G  by  removing  the 
k-2  vertices  v^,  v2,  -..7  vk.2  and  all  the  edges  incident 
with  these  vertices  from  G. 

Proof:  Trivial.  ■ 

Finding  the  k-connected  components  for  k5:4  is  of  no 
practical  interest.  However,  for  k=3,  the  problem  is  closely 
related  to  the  planar  graph  problem  which  has  application  in 
electrical  engineering.  The  previous  best  algorithm  for  the 
3-connected  components  takes  0(lg2n)  time  with  n 4  processors 
on  the  PRAM[ JAJA82 ] .  No  algorithms  for  other  parallel 
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computer  models  were  known.  Using  Lemma  6.1,  it  is  easily 
shown  that  we  can  improve  Ja'Ja'  and  Simon's  3-connected 
components  algorithm  by  reducing  the  number  of  processors 
used  by  a  factor  of  lg2n. 

Despite  the  fact  that  our  results  obtained  here  for 
^-connectivity  problem  (k^3)  are  improvements  over  the 
previous  results,  we  do  not  regard  them  as  achievements 
because  the  method  proposed  by  Lemma  6.1  is  essentially  a 
brute  force  method,  let  alone  the  fact  that  the  results  are 
not  optimal  (In  fact,  all  the  previous  results  stated  above 
are  to  a  great  extent,  brute  force  methods). 

At  this  point,  it  is  interesting  to  review  the  results 
for  these  problems  obtained  on  the  sequential  RAM.  The 
sequential  algorithms  for  finding  the  connected  components, 
the  biconnected  components  and  the  triconnected  components 
all  rely  on  the  depth-first  search  technique  and  run  in 
optimal  time  and  space [TARJ72 ,HOPC73 ] .  Since  we  have 
developed  an  optimal  (w_.r_.t_.  time-processor  product) 
directed  spanning  forest  algorithm  in  Chapter  2,  based  on 
which  an  optimal  biconnected  component  algorithm  was 
developed,  and  we  have  shown  in  Chapter  4  that  the 
biconnected  component  algorithm  gives  rise  to  an  optimal 
sequential  algorithm  which  is  a  generalization  of  the 
previous  optimal  sequential  algorithm.  It  is  therefore 
intriguing  to  ask:  Using  the  optimal  directed  spanning 
forest  algorithm,  can  we  develop  an  optimal  parallel 
algorithm  for  the  triconnected  component  problem  which  gives 
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rise  to  an  optimal  sequential  algorithm  which  is  a 
generalization  of  the  existing  optimal  sequential  algorithm 
on  the  PRAM?  We  leave  this  as  an  open  problem. 
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APPENDIX  :  Some  Detailed  Implementations 


Algorithm  DSF 


(*To  find  an  inverted  spanning  forest  in  an  undirected  graph 
*) 

stage  1 

{  Variable  declarations  } 

M  :  array [ 1 . .n , 1 . .  n]  of  0..1; 

FR +  :  array [ 1 .. 2n- 1 , 0 .. n- 1 ]  of  1..nlgn; 
depth  :  array  [  1  ..  2n~  1  ]  of  0../7-1; 

PTR  :  array [ 1 .. nlgn ]  of  1  .  .  2n— 1 • 

DV  :  array [ 0 .. lgn , 1 .. n ]  of  1..n; 
rootv  :  array [ 1 .. 2n- 1 ]  of  1..n; 

B  :  array [ 1 . . 2 , 1 . . n , 1 . . n ]  of  1..n; 
flag  :  array[1..n]  of  0..1; 

D,C  :  array[1..n]  of  1..n; 
phase  :  I..lgn;  start pt  :  1 . .  2n—  1  j 

step  1 :  {  initialization  } 
for  all  /:1</<n  pardo 

Dl/[0,/]:=D[ /]:=/;  flag  [i];=0 
dopar ; 

for  all  7:1</<nlgn  pardo  PTR[i]:= 0  dopar; 
for  all  /:1</<2n-1  pardo 

FR+ [ 7,0]: =FR+ [ 7 , 1 ] : =0; 
rootv [ / ] :  =  0 
dopar; 

for  all  pardo 

B[  1 ,  /,_/]:  =  /;  B[  2,  / ,  j  ] :  =  j 
dopar; 

phase ; =0 ;  star tpt :=0; 


repeat 

step  2(a): 

{  Pack  all  defined  rows  in  each  segment  together} 

S :  =  { / 1 f 1 ag [ / ] : =  0  } ; 

{Set  pointers  in  array  PTR .  second  is  a  function 
extracting  the  second  portion  of  a  variable  formed 
by  the  function  concatenation  in  the  preceeding 
step. } 

temp: =second( sort ( {  concat(  f lag[ i ] , / ) | 1^ i^n  })); 
PTR[phase*n+ 1 , , (phase+ 1 ) *n]  :=  second (sort ( {concat ( 
tempi  i  ] ,  start  pt+  i )  |i</<|s|}l/ 

{concat  ( tempi  /  ].,  0 )  |  |  S  |  <  i<n) ) ) ; 

start pt : =startpt+n/2 *  * phase ; 


step  2(b): 

for  all  7  e S  pardo 


153 


■ 

ft  > 


:  *  r  „  ;  ,  <*.  v 


L. 


V 

> 

:•  ; .  -  /* 

.*.  ■'  i!\  s ) ;  q**a 


154 


jo :=min{ j |M[ / ,y]=1 ,  j  e  S } 
if  none  then  j  0  :  =  / ; 

CC  /]  :  =  joi 

FR+ [PTR[phase*n+ i]  , 0 ] : =phase*n+i; 
FR+ [PTR[ phase *n+ /]  ,  1  ] :=phase*n+ j0 

dopar; 


step  3(a): 

{Check  to  see  if  the  set  S  can  be  reduced  any  further; 
if  not,  then  terminate  execution] 
if  (for  all  ie S,  C[ /]=/)  then  exit; 

step  3(b): 

for  all/eS  pardo  if  C[/]=/  then  flag[i]:=}  dopar; 
step  4: 

for  all  ie S  pardo  D[ i 3 : =C[ 7 ] ;dopar; 
step  5: 

for  j:=1  step  1  until  lgn  do 

for  all  ie S  pardo  CC / ] : =C[C[ / ] ]  dopar; 

step  6(a): 

for  all  ie S  pardo  DC 7 ] : =min {C[ / ] ,D[C[ / ] ]  }  dopar; 
step  6(b): 

for  all  /:  1</^r?  pardo  D[  i  ]  :=D[D[i]  3  dopar; 

step  6(c):  {Record  the  array  DC/],  1 ^i^n  } 
for  all  /:1</<n  pardo 

if  ie  S 

then  D\/  [  phase + 1 ,  f  ] :  =  DC/] 
elseDy[phase+ 1 ,  /  ]  :  =DCDl^CpA7ase,  /  ]  ] 
doparj 

step  6(d) :{  Convert  the  edge  from  the  smallest-numbered 
vertex  of  each  /-tree-loop  to  a  self-loop  } 
for  all  i :D[ / ] = / 
pardo 

FR* [PTR[phase*n+ i ] ,  1 ] :=FR+ [PTR[ phase* n+ / ] , 0 ] 

dopar; 

step  7(a): 

for  all  ie S  pardo 

for  all  je S  :  J=Dt7]  pardo 

Choose  any  j0e S  such  that  D[j0]=J  and  Af[/7Jo]  =  1 
if  none  then  j0 : =j; 

MC  / , J ] :  =M[  i.,j oil 

B [  1,  /  , j] :=B[ 1 ,  / , Jo ]j 
DC 2, / ,j] :=D[2,  / , j0] 
dopar 
dopar; 


step  7(b): 
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for  all j eS  :  j=D[j]  pardo 

for  all  ie S  :  i=D[i]  pardo 

Choose  any  i0eS  such  that  D[/03  =  /  and  M[/0, j ] = 1 

if  none  then  /0:  =  /; 

M[  /,  j]  :=Af[  i0rj]; 

B[ 1, /, j]:=B[1, /o,j]; 

B[2, / ,  j] :=B[2, / o  rJH 
dopar 
dopar; 

step  7(c): 

for  all  /eS  pardo  M[ /,/]:  = 0  dopar; 
step  8: 

for  all  /  e  S  pardo  if  DC/]#/  then  f7agr[/]:  =  1  dopar; 
phase : = phase* 1 ; 
until  (phased lgn); 


stage  2 

step  1 :  {  Evaluate  the  array  FR  +  } 

Compute  FR+  and  depthl  /  3  f  or  1</<2r?-1. 

step  2: 

phase : = phase- 1 ; 

{  Note  that  at  this  point,  each  vertex  k  left  in  S  is 
the  root  of  a  in-tree  recorded  in  the  'last'  segment] 

for  all  kike S  pardo 

rootv [PTR[ phase* n+k] ]  :=k 
dopar; 
repeat 

for  all  /:  {phase*n+'\<i<  {phase+  l)*n 
and  P77?[  /  ]#0 

and  FR+ [PTR[ i ] , (n- ] ) -depth [ i  ]] 

#  FR+  [PTR[ / ] , (n- 1 )-depth[ / ]+ 1 ] )  (not 
self-loop] 

pardo  {  Output  all  the  edges  except  the  one 
emitting  from  the  new  root  first] 

{Denoting  FR+ [PTR[ i ] ,  ( n- 1 ) -depthl / ] ]mod  n 
and  FR+  [P77?[  /  ] ,  (n-1 ) -depthl  /]  + 1]  mod  n  by 
v0[  i]  and  v^[ i]  respectively  ] 
if  rootv[PTR[  i  ] ]  =  0  then 
begin 

T[^  fB[^  ,v0[  i]  rV^[  i]]  3:=B[1,\/o[/],\/i[/]3; 
T[2rS[1fv'0[/]jV'1[  /]]  ]  :=B[2jV0  C  /]  C  /]  ]; 
end; 

{Define  the  roots  for  the  next  segment]; 
if  phase  >0 

then  root  v[PTR[DV[  phase-  1_,B[  1 ,  v0[  i]  ,v,[  i]]]  + 
( phase- 1 )  *n  ]  ] :  =B[  1 , i/0  C  /  3 ,  v i  [  /  3  3 ; 
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{  Reverse  the  edges  if  necessary  } 
i f  rootv [PTR[ /  3  3  * 0 

then  for  all  j : ( ( n- 1 ) -depthl i ]  <j<(n-1)) 

pardo {Denot ing  FR+ [  PTRl / ] , j ]mod  n  and  FR+ [ 
PTR[i],j+ 1 ]mod  n  by  v0[j]  and  v,[j] 
respectively} 

7[ 1 ,B[2rv0[j] ,v, [j] ]  3 :=B[2 ,v0[j] ,v  y  [j] 3 ; 
T[2,B[2tv0[j)  ,v,  [  j]  3  ):=B['\  ,v0[j)  rv,  [j)); 

{  Redefine  the  roots  as  well  }; 
if  phase  >0  then 
begin 

rootv [PTR[DV[ phase- 1 ,  B[ 1 ,v0[j 3 ,v,[ j] ] ] 

+  (phase- 1 )*n] ] :  =  0 ; 

root\z[PTP[Dl/[  phase- 1 ,  B[  2,v0  [  j]  rv,  [j]  ]  3 
+  ( phase- 1  )*n]  ]  :=S[2,\/o  [j'3 ,  \/i  [j’3  3 

end 
dopar 
dopar j 


{Pass  the  roots  defined  in  the  current  and  previous 
segments  to  the  next  segment] 
for  all  i :  (phase*n+'\<i<{phase+'\)*n 
and  P77?[/3  and  rootv[PTR[  i  ]  ] *0 ) 
pardo 

rootv [ P7P [ DV [ phase- 1 ,  rootv[PTR[ i  3  3  3  +  (phase 
-  1 ) *nj  3 :=rootv[PTR[ i 3  3 
doparj 

phase := phase- 1 ; 
until  (phase< 0); 


Algorithm  Bridges (M, br idge) i 

Input  :  The  adjacency  matrix  M  of  a  connected,  undirected 
graph  G(VrE ); 

Output:  A  nxn  matrix  br idge[ 1 . .n,  1,.n]  such  that 
br idge[ i , J3  =  1  iff  ( i  ,j )  is  a  bridge  in  G; 

Step  1:  Call  inverted-spanning-forest (M,T) j 
Step  2:  Find  HLCA[ / 3 (using  the  method  presented  in  Section 
8)  and  depthi / 3 ,V i eV ,  then  computed  dHLCA [/], V  i  eV y 
Step  3:  for  all  i,j:  (i,j)e VxV 

pardo  bridge[i,j]  :=br idgelj , i 3 :=0  doparj 
for  all  e:e=(arb) eE'  {  e  is  in  T  } 
pardo  for  all  i:ie V 
pardo 

if  F + [ / , ( n -  1 ) -depth [ a 3  3  =a  (a  is  an 
ancestor  of  /}  and  dHLCA[/3  <  depthla 3 
then  Bla , / 3 : =0 
else  Bla, / ] : = 1 
dopar 
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hridge[a,b]:=  br idgeib ,a] :=  /\{  B[a, i]\ i eV] 

dopar; 


Algorithm  LCA ( T , LCA + , depth ) ; 

Input  :  the  vector,  T[ for  a  directed  tree  T  such  that 
T[i]=j  iff  j  is  the  father  of  i  in  T. 

Output:  The  ordered  pair  ( LCA + , F + ) ; 

Step  1:  Compute  F+j 

Step  2:  {  Find  the  lowest  common  ancestor  for  ( a,b )  where 
(a,b)e V'xV'  based  on  F+  and  binary  search} 
for  all  (a,b)e V'xV' 
pardo 

ptr x =  Ln/2J ;  ^:=0  ut=n~ 1 ; 
for  t:  =  1  step  1  until  j-lgn-|  +  1  do 
begin 

if  F+[afptr]=F+[b,ptr] 

then  {move  left}  u:=ptr 
else  {move  right}  4:=ptr+ 1; 
ptr:-  L(u+1 )/2j ; 
end; 

lca+ [a,b] : =ptr 
dopar; 
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